Ticket #602 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

RSS problems

Reported by: ed Owned by: jim
Priority: major Milestone: Maintenance
Component: Drupal modules & settings Keywords:
Cc: Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 4.85

Description

  1. Blogs feed not working:

https://www.transitionnetwork.org/blogs/feed/

  1. Rob's blog not working:

https://www.transitionnetwork.org/blogs/feed/rob-hopkins

  1. Getting complaints from users who are getting multiple copies of each blog post. Can't check this as the feed is down, but I've got 2 or 3 complaints in my inbox.

Attachments

tc-newsblur.png (433.5 KB) - added by ed 3 years ago.
screengrab of newsblur rss reader with multiple copies of Transition Culture blog
Screen Shot 2013-10-10 at 15.26.57.png (285.7 KB) - added by ed 3 years ago.
screengrab of feedly reader
Screen Shot 2013-10-10 at 15.36.51.png (257.9 KB) - added by ed 3 years ago.
another screengrab of the TN feed being picked up multiple times by a reader
Liferea_005.png (132.2 KB) - added by jim 3 years ago.
Liferea RSS reader showing no dupes
Screen Shot 2013-10-11 at 08.08.30.png (141.0 KB) - added by ed 3 years ago.
Feedly screengrab showing multiple RSS items for one blog post 11/10
Selection_006.png (53.1 KB) - added by jim 3 years ago.
Aegir -- Site aliases settings

Change History

comment:1 Changed 3 years ago by chris

I realise this isn't my ticket and that the problem was caused by not allowing people to access the RSS feeds using HTTPS, ticket:588 but I can't see why there is a problem, if you request the feed using HTTPS you are told to use HTTP, see the Location line:

lynx -useragent "User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" -head -dump https://www.transitionnetwork.org/blogs/feed                

HTTP/1.1 301 Moved Permanently
Server: nginx
Date: Thu, 03 Oct 2013 13:21:19 GMT
Content-Type: text/html; charset=utf-8
Connection: close
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
X-Local-Proto: https
X-Cookie-Domain: .transitionnetwork.org
X-Redis-Prefix: www.transitionnetwork.org_
Location: http://www.transitionnetwork.org/blogs/feed
Etag: "1380806478-0"
Cache-Control: public, max-age=0
Last-Modified: Thu, 03 Oct 2013 13:21:18 +0000
Expires: Tue, 24 Jan 1984 08:00:00 GMT
Vary: Cookie
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: EXPIRED
X-Speed-Cache-Key: /blogs/feed
X-NoCache: Cache
X-This-Proto: https
X-Server-Name: www.transitionnetwork.org
Vary: Accept-Encoding

And when you request the feed using HTTP it is OK:

lynx -useragent "User-Agent: Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" -head -dump http://www.transitionnetwork.org/blogs/feed 

HTTP/1.1 200 OK
Server: nginx
Date: Thu, 03 Oct 2013 13:22:25 GMT
Content-Type: text/html; charset=utf-8
Connection: close
Vary: Accept-Encoding
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
X-Cookie-Domain: .transitionnetwork.org
X-Redis-Prefix: www.transitionnetwork.org_
Etag: "1380806544-0"
Cache-Control: public, max-age=0
Last-Modified: Thu, 03 Oct 2013 13:22:24 +0000
Expires: Tue, 24 Jan 1984 08:00:00 GMT
Vary: Cookie
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: EXPIRED
X-Speed-Cache-Key: /blogs/feed
X-NoCache: Cache
X-GeoIP-Country-Code: GB
X-GeoIP-Country-Name: United Kingdom
X-Server-Name: www.transitionnetwork.org

(I had the fake the useragent as BOA only serves 403 Forbidden's to requests from the lynx browser.)

As far as I can see there should only be a problem for people with the EFF HTTPS Everywhere plugin installed.

comment:2 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.0 to 0.15

Thanks Chris.

I notice that when the loop happened, it was 301 -> HTTP then 302 -> HTTPS. My tweak on #588 was using 301, so I guessed the 443Session module must be redirecting people...

Strange that it would attempt a redirect to HTTPS for non-logged in users on a blog feed! So I've added a line to 'Ignore pages' on the module config page to stop this blog/*/feed... And now it works.

Odd. 443Session module is the cause, though why it's redirecting cookieless/sessionless users to HTTPS on blog feeds probably indicates a bug. I'll check the issues list now.

comment:3 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Status changed from new to closed
  • Resolution set to fixed
  • Total Hours changed from 0.15 to 0.3

Ok, no issue for this in the 443 Session issue tracker, so I'm chalking this up to 'things that will get better on D7' and not spending any more time on it. Will add a note to #588 that this situation (and workaround) exists.

Done.

comment:4 Changed 3 years ago by jim

  • Status changed from closed to reopened
  • Resolution fixed deleted

OK maybe not...

comment:5 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 0.3 to 0.4

OK I had the path wrong for the exception.. Now works.

I've got a suspicion this might reoccur, in which case we have some caching of responses that are being server to people. Or something else.

Will keep an eye on it.

comment:6 Changed 3 years ago by ed

does this also cover the users getting multiple copies problem?

comment:7 follow-up: ↓ 9 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.4 to 0.9

Re Dupes:
I'm yet o see the dupes... They shouldn't exist in the actual view that builds the feed -- though if people have been subscribed to the HTTPS (slow, server-caning) version, when they are forced to get the HTTP version their reader might see the entries as separates.

---

So the responses for HTTPS are being microcached, hence why it works one minute and not the next. I've commented out my code for now (meaning blog posts are over HTTPS and HTTP). What exactly is causing the redirect via a 302 (NOT my code, it's 301) is still unclear.

I don't get why this happens though...

I've tried adding code to the detection to break out of a redirect loop. This allows us to make *most* HTTPS blog requests go to HTTP, but those causing a loop will be allowed to use HTTPS for that shot. It kinda works.

Leaving this disabled for now, Ed, thoughts?

comment:8 Changed 3 years ago by ed

I have absolutely no idea what you are talking about! We will have to reconvene around this on voice so you can spell it out for me maybe with diagrams. Teh thing I'm wondering is - is this related to the change to force it through http? I'm going to soothe my brain in a block of custard for now.

comment:9 in reply to: ↑ 7 ; follow-up: ↓ 10 Changed 3 years ago by chris

Replying to jim:

HTTPS (slow, server-caning) version

The tests I did on ticket:588#comment:10 showed that Ngnix serving a static RSS file via HTTP was twice as fast as the BOA stack serving a RSS feed. In other words if Drupal was able to write out static files for the RSS feeds and these were served directly by Nginx and not via php-fpm / redis / mysql etc then they would be served in half the time than at present.

I also didn't look at the load generated doing HTTPS compared to HTTP -- what evidence do you have that it is "server-caning"? All I found was that it was slower.

comment:10 in reply to: ↑ 9 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.9 to 1.4

Replying to chris:

The tests I did on ticket:588#comment:10 showed that Ngnix serving a static RSS file via HTTP was twice as fast as the BOA stack serving a RSS feed. In other words if Drupal was able to write out static files for the RSS feeds and these were served directly by Nginx and not via php-fpm / redis / mysql etc then they would be served in half the time than at present.

This is precisely what the NginX speedcache does -- serving a static file will always be quick over HTTP and nearly as quick over HTTPS and we want it!

But it's not going to work past the cache expiry interval for HTTPS requests, and since the page cache (Redis) doesn't work for more than a few seconds for HTTPS either, it appears the feed pages were dropping out of the static speedcache and the redis cache too quickly.

I think misbehaving/misconfigured in-browser RSS readers could be compounding the issue.

I also didn't look at the load generated doing HTTPS compared to HTTP -- what evidence do you have that it is "server-caning"? All I found was that it was slower.

Because if it's not statically served or pulled from the cache, Drupal will be executed instead, which is an order of magnitude (or two) slower.

However, it appears for some reason something was either caching or generating a 302 from somewhere from HTTP back to HTTPS which caused the loop. I can't work out where that's coming from, nor will I at this time...


I've changed tack and have solved the problem in a way that means there's no redirect, which covers Chris' concerns regarding HTTPS Everywhere users. See #588 for details.

Closing this.

comment:11 Changed 3 years ago by jim

  • Status changed from reopened to closed
  • Resolution set to fixed

comment:12 Changed 3 years ago by ed

  • Status changed from closed to reopened
  • Resolution fixed deleted

Here comment and screengrab from Simon who is looking after Transition Culture and syndicating the blogs:

"Things were fine for a while, but recently the feed seems to contain repeat items, usually three or four times.

At first I thought it was an issue with the fetcher on TC, but I subscribed to the feed in my RSS reader (NewsBlur?) and the same issue is showing there (see attached screenshot)."

Changed 3 years ago by ed

screengrab of newsblur rss reader with multiple copies of Transition Culture blog

comment:13 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.4 to 1.5

I can't see any duplicates:

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" https://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<title>"
    <title>Rob Hopkins&#039;s blog</title>
    <title>Letter from America #2: Local action can change the world</title>
    <title>Letter from America #1: New Orleans and reflections on “awesome”</title>
    <title>New report: Climate After Growth</title>
    <title>Tina Clarke on the Joys of Discovering Effective Collaboration</title>
    <title>The Big Debate: Is there a &#039;Transition position&#039; on fracking?</title>
    <title>Viv Chamberlin-Kidd on the Permaculture Design Course that changed her life</title>
    <title>A lovely story from the Bristol Pound</title>
    <title>Marie Lefebvre on the Power of Getting Organised</title>
    <title>Naresh Giangrande introduces the Transition Launch Online training</title>
    <title>Sophy Banks on bringing that ‘being on holiday’ feeling into busy working life</title>
    <title>Jen Gale on the power of learning to sew </title>
    <title>The best course I ever did, and 11 Top Tips for creative teaching</title>
    <title>Hide Enomoto on the spread of Transition in Japan</title>
    <title>An interview with Nafeez Ahmed: &quot;This is an unprecedented opportunity&quot;</title>
    <title>What Van Gogh can teach us about education and learning</title>
    <title>Sophy Banks on the Power of Not Doing Stuff</title>
    <title>It&#039;s your summer reading! The Transition Infographic</title>
    <title>Joanna Blythman on the Power of &quot;lots and lots of little projects&quot;</title>
    <title>One Year in Transition and the Power of Alternatives to University</title>
    <title>Bill Mollison and one of my key ‘Doing Stuff’ moments</title>

comment:14 Changed 3 years ago by ed

and yet duplicates are appearing or did appear - hopefully Simon will reply to me and let me know they've gone

comment:15 Changed 3 years ago by ed

I registered with feedly, added Rob's blog feed. I got two copies of letter from america #2 and 4 copies of letter from america #1. Screengrab attached.

I have single copies of other blogs, and the stories blog posts.

The news feed seems OK but there are 4 copies of one news item.

There is definitely a problem.

Is this something to do with updating?

Changed 3 years ago by ed

screengrab of feedly reader

Changed 3 years ago by ed

another screengrab of the TN feed being picked up multiple times by a reader

comment:16 Changed 3 years ago by ed

This from Simon at Lumpy Lemon who host Transition Culture:

"letter from america #1 and #2 have both come in twice (in newsblur and on TC)... :-("

so his newsreader and the archive we are keeping of Rob's blogs are getting multiples as well.

Changed 3 years ago by jim

Liferea RSS reader showing no dupes

comment:17 follow-up: ↓ 18 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Status changed from reopened to closed
  • Resolution set to fixed
  • Total Hours changed from 1.5 to 2.0

Per our chat: https://www.transitionnetwork.org/blogs/feed/rob-hopkins clearly shows only one of each item in my browser... And indeed in Liferea reader:

https://tech.transitionnetwork.org/trac/raw-attachment/ticket/602/Liferea_005.png

So I think this is a hangover from the redirect times... Please open each dupe in the browser and report the URLs -- I'm guessing one of each for HTTP and HTTPS.

Newsblur and other global aggregation services will cache items as they come in, so I doubt they'd take out items from the feed just because they're not there...

This is a won't/can't fix, sorry -- and the problem is already fixed.

comment:18 in reply to: ↑ 17 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 2.0 to 2.15

Replying to jim:

I'm guessing one of each for HTTP and HTTPS.

If you request the RSS feed using HTTP or HTTPS all the <link> elements have HTTP URLs so I don't think this is the cause, HTTPS request:

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" https://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<link>"
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-3-something-powerful-stirs-texas</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-2-local-action-can-change-world</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-1-new-orleans-and-reflections-awesome</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/new-report-climate-after-growth</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/tina-clarke-joys-discovering-effective-collaboration</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/big-debate-there-transition-position-fracking</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/viv-chamberlin-kidd-permaculture-design-course-changed-her-life</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/lovely-story-bristol-pound</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/marie-lefebvre-power-getting-organised</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/naresh-giangrande-introduces-transition-launch-online-training</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/sophy-banks-bringing-being-holiday-feeling-busy-working-life</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/jen-gale-power-learning-sew</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/best-course-i-ever-did-and-11-top-tips-creative-teaching</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/hide-enomoto-spread-transition-japan</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/interview-nafeez-ahmed-unprecedented-opportunity</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/what-van-gogh-can-teach-us-about-education-and-learning</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/sophy-banks-power-not-doing-stuff</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/its-your-summer-reading-transition-infographic</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/joanna-blythman-power-lots-and-lots-little-projects</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/one-year-transition-and-power-alternatives-university</link>

HTTP request:

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" https://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<link>"
^C
chris@parrot:~$ wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" http://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<link>"
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-3-something-powerful-stirs-texas</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-2-local-action-can-change-world</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-1-new-orleans-and-reflections-awesome</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/new-report-climate-after-growth</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/tina-clarke-joys-discovering-effective-collaboration</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/big-debate-there-transition-position-fracking</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/viv-chamberlin-kidd-permaculture-design-course-changed-her-life</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/lovely-story-bristol-pound</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/marie-lefebvre-power-getting-organised</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/naresh-giangrande-introduces-transition-launch-online-training</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/sophy-banks-bringing-being-holiday-feeling-busy-working-life</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/jen-gale-power-learning-sew</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/best-course-i-ever-did-and-11-top-tips-creative-teaching</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/hide-enomoto-spread-transition-japan</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/interview-nafeez-ahmed-unprecedented-opportunity</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/what-van-gogh-can-teach-us-about-education-and-learning</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/sophy-banks-power-not-doing-stuff</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/its-your-summer-reading-transition-infographic</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/joanna-blythman-power-lots-and-lots-little-projects</link>
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/one-year-transition-and-power-alternatives-university</link>

I still can't see what is causing the dupes.

comment:19 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 2.15 to 2.2

Thanks Chris, then I too don't have an explanation -- unless the content itself was published twice or under different names etc.

In any case this is a "can't fix" since the feeds come straight out of the blog module (that have not been altered).

Suggest this is re-opened if the issue continues with new content.

comment:20 Changed 3 years ago by ed

  • Status changed from closed to reopened
  • Resolution fixed deleted

I know you don't like this, but different users are getting multiples, period. Today's feedly screengrab attached.

is this something to do with the CMS updating items perhaps?

this ticket is staying open while I still get complaints.

Changed 3 years ago by ed

Feedly screengrab showing multiple RSS items for one blog post 11/10

comment:21 Changed 3 years ago by jim

As requested a couple of comments back, please open each dupe in a new browser window and post the URLs opened here.

I note the letter from America #3 is not duplicated...

comment:22 Changed 3 years ago by jim

SORRY -- IS duplicated.

Need those URLS!!!

comment:23 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 2.2 to 2.45

Replying to ed:

I know you don't like this, but different users are getting multiples, period. Today's feedly screengrab attached.

is this something to do with the CMS updating items perhaps?

Are the URLs exactly same for the repeat items? If the URL is changed then I expect RSS readers will see it as a new item. Are some HTTP and some HTTPS?

Today the links are HTTPS ones, yesterday they were HTTP, for both HTTP and HTTPS requests:

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" http://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<link>"
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-3-something-powerful-stirs-texas</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-2-local-action-can-change-world</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-1-new-orleans-and-reflections-awesome</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/new-report-climate-after-growth</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/tina-clarke-joys-discovering-effective-collaboration</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/big-debate-there-transition-position-fracking</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/viv-chamberlin-kidd-permaculture-design-course-changed-her-life</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/lovely-story-bristol-pound</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/marie-lefebvre-power-getting-organised</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/naresh-giangrande-introduces-transition-launch-online-training</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/sophy-banks-bringing-being-holiday-feeling-busy-working-life</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/jen-gale-power-learning-sew</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/best-course-i-ever-did-and-11-top-tips-creative-teaching</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/hide-enomoto-spread-transition-japan</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/interview-nafeez-ahmed-unprecedented-opportunity</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/what-van-gogh-can-teach-us-about-education-and-learning</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/sophy-banks-power-not-doing-stuff</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/its-your-summer-reading-transition-infographic</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/joanna-blythman-power-lots-and-lots-little-projects</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/one-year-transition-and-power-alternatives-university</link>
wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" https://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "<link>"
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-3-something-powerful-stirs-texas</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-2-local-action-can-change-world</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/letter-america-1-new-orleans-and-reflections-awesome</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/new-report-climate-after-growth</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/tina-clarke-joys-discovering-effective-collaboration</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/big-debate-there-transition-position-fracking</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/viv-chamberlin-kidd-permaculture-design-course-changed-her-life</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/lovely-story-bristol-pound</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/marie-lefebvre-power-getting-organised</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/naresh-giangrande-introduces-transition-launch-online-training</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/sophy-banks-bringing-being-holiday-feeling-busy-working-life</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/jen-gale-power-learning-sew</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/best-course-i-ever-did-and-11-top-tips-creative-teaching</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/hide-enomoto-spread-transition-japan</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/interview-nafeez-ahmed-unprecedented-opportunity</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-09/what-van-gogh-can-teach-us-about-education-and-learning</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/sophy-banks-power-not-doing-stuff</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/its-your-summer-reading-transition-infographic</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/joanna-blythman-power-lots-and-lots-little-projects</link>
    <link>https://www.transitionnetwork.org/blogs/rob-hopkins/2013-07/one-year-transition-and-power-alternatives-university</link>

Jim -- it looks to me like the caching is resulting in HTTP requests getting HTTP and HTTPS links at different times and HTTPS requests also getting HTTP and HTTPS links at different times -- we need to do one of these:

  1. Serve RSS with HTTP links via HTTP and HTTPS links via HTTPS
  2. Serve RSS with HTTP or HTTPS links via HTTP and HTTPS

My suggestion would be to do no redirection between HTTP and HTTPS -- let the client access the RSS via HTTP or HTTPS and then to serve RSS with HTTP links via HTTP and HTTPS links via HTTPS.

comment:24 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 2.45 to 2.5

There is NO redirection happening. There is simply caching. A redirect would solve this problem.

Drupal will use the base URL including -- HTTP scheme -- for the relative links.

Yesterday you got a cached HTTP page, today it's cached HTTPS that answered your request.

I'm not sure how to change the link root, but the use of Pathologic might have something to do with it. I'll investigate.


Update: I think I might be able to override the view theme variables to make the links all HTTP...

Last edited 3 years ago by jim (previous) (diff)

comment:25 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 1.25
  • Total Hours changed from 2.5 to 3.75

[Grr had a long post lost by Trac's silly session handling...]

So I've added some code to our Transition Extras module that forces all RSS feed links within the XML to be HTTP for only TN.org domains.

This means a request for a blog over HTTPS will always return the HTTP links for the feed items, but no change are made to external URLs and referenced content (within the body of a post).

This means the RSS caching/dupes issue is now fixed for all future content posted, plus we with the work on #590 and the addition of Views Content Cache we can have really long cache times for RSS feeds.

Plus people opening the full page of a RSS item their browser will get HTTP first, but not be prevented from going to HTTPS if really want.

There should now be no dupes for new content - old content already duped won't be affected...

comment:26 follow-up: ↓ 27 Changed 3 years ago by ed

This from Simon at Lumpy Lemon who hosts Transition Culture: NB It was 11/10 - so may be out of date - but the point about www and non-wwww may still be valid

~

i've just subscribed in feedly, it's also showing duplicate posts, but not the same quantities as in newsblur!

e.g.
letter from america #1: twice in newsblur, four times in feedly
letter from america #2: twice in newsblur, three times in feedly
letter from america #3: twice in newsblur, twice in feedly

but i've also spotted something interesting & hopefully helpful: when i hover over the link to the original article one is https www, one is https non-www, one is http www, & one is http non-www.

so it looks like you are producing items once for each protocol, and some readers are picking up all of them while others are only picking up two (http & https).

Last edited 3 years ago by ed (previous) (diff)

comment:27 in reply to: ↑ 26 Changed 3 years ago by chris

Replying to ed:

but i've also spotted something interesting & hopefully helpful: when i hover over the link to the original article one is https www, one is https non-www, one is http www, & one is http non-www.

Yes that explains it, these are, correctly, all different URLs for the same article:

  1. http://transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration
  2. https://transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration
  3. http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration
  4. https://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration

Different RSS URLs serve different <link> URLs:

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" http://transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "interview-paul-hawken-we-choose-path-regeneration<"
    <link>http://transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration</link>

wget -q --user-agent="Mozilla/5.0 (X11; Linux i686; rv:24.0) Gecko/20100101 Firefox/24.0" http://www.transitionnetwork.org/blogs/feed/rob-hopkins -O - | grep "interview-paul-hawken-we-choose-path-regeneration<"
    <link>http://www.transitionnetwork.org/blogs/rob-hopkins/2013-10/interview-paul-hawken-we-choose-path-regeneration</link>

When we were running with Apache and Varnish we redirected all requests for transitionnetwork.org to www.transitionnetwork.org

Jim, can we do this with BOA? Or does it need to be added to a Nginx config file?

Last edited 3 years ago by chris (previous) (diff)

comment:28 Changed 3 years ago by jim

This is precisely what I was expecting and commented about a number of times.

So the HTTP/HTTPS difference is now covered, per comment:26.

And we're left with the www vs naked domain issue that Chris raises.

I think this is a BOA setting, though clearly it can be fixed with an Nginx tweak too.. I'll take a gander now.

Changed 3 years ago by jim

Aegir -- Site aliases settings

comment:29 follow-up: ↓ 31 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 3.75 to 3.85

OK so check out these settings https://tech.transitionnetwork.org/trac/raw-attachment/ticket/602/Selection_006.png

Looks like we can either:

  • Stop generating the aliases automatically and just have the standard www.tn.org domain THEN add NginX rewrites; OR,
  • Check the 'Use redrects instead of aliases by default' option and re-verify the platform.

comment:30 Changed 3 years ago by jim

Thoughts?

comment:31 in reply to: ↑ 29 Changed 3 years ago by chris

Replying to jim:

Looks like we can either:

  • Stop generating the aliases automatically and just have the standard www.tn.org domain THEN add NginX rewrites; OR,
  • Check the 'Use redrects instead of aliases by default' option and re-verify the platform.

The language in the screenshot makes it sound like Apache is being used (but of course it isn't), assuming the options are based on the behaviour one would expect from Apache then I'd go for the "Use redrects instead of aliases by default" option.

comment:32 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 3.85 to 4.85

Ok so the checking of that box and reverify of the site didn't do much.

I've therefore added 4 lines of code to the Transition Extras module RSS tweaking code to detect and replace naked domains with www ones.

Tested and works, so this is fixed I hope.

Clearly, having a site respond to both www and naked domains with different requests (rather than redirecting one to the other) is not ideal. On that front, the option Chris and I discussed in comment:29 didn't do anything... As Chris guessed, it's a Apache thing with NginX support by Aegir pending.


So, after much faff I found this: https://omega8.cc/how-to-manage-aliases-and-redirects-127

And simply enabled the alias redirects on just the TN.org site. This has cost us the 'dev.www.transitionnetwork.org' for now, but we're not needing that under normal circumstances, and we can always disable this if needs be for debugging. The fix was 5 seconds, finding it was 1/2 an hour!

So this is DOUBLE FIXED!

comment:33 Changed 3 years ago by ed

  • Status changed from reopened to closed
  • Resolution set to fixed

Triangulated user approval of it working. Closing ticket.

Note: See TracTickets for help on using tickets.