Ticket #334 (closed maintenance: needs testing)

Opened 5 years ago

Last modified 5 years ago

Image serving on https

Reported by: ed Owned by: jim
Priority: major Milestone:
Component: Drupal modules & settings Keywords:
Cc: jim, chris, laura Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 0.975

Description

I note that images that are uploaded to the server are then served from https - e.g.:
<img alt="hands in sand image" src="https://www.transitionnetwork.org/sites/default/files/uploaded/u4/hands_in_sand.jpg" width="120" height="120" align="left">

does this complicate things? browsers not liking secure and insecure items on a page?

at the time of writing, the image was showing on
https://www.transitionnetwork.org/stories

but not showing on
http://www.transitionnetwork.org/stories

I presume that is a caching issue?

Change History

comment:1 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 0.0 to 0.1

does this complicate things? browsers not liking secure and insecure items on a page?

It means the images won't be cached by varnish, apart from that it's not a problem as a http page with https items in it won't trigger warnings.

If the link was in the format src="/blah/" then it would be http or https depending what the client was using, the only drawback with embedding images and linking like this is that if the content is syndicated the links / embedded images won't work on the remote site unless the URL's are rewritten to be fully qualified.

I expect it was a caching issue that prevented the image from showing up, you can always break the cache by adding a query string to the URL.

comment:2 Changed 5 years ago by ed

ta.

there is a lag between adding things, then going on https, and http. How long does the cache take to cache http to https?

I think this might be bamboozling some users adding content in profiles, need to keep an eye on it. Might need some explano-text somewhere.

comment:3 Changed 5 years ago by chris

I'm still unclear as to what's happening, when images are added they show up for logged in users on https but not for anon users on http?

If that is what happening it'll be because the http version is being served from the varnish cache.

comment:4 Changed 5 years ago by ed

in my experience (to date always as a logged in user), the image (and any change) shows up on https immediately, and on http later...

have found the lag is 30 mins - the social reporters were worried but calm now.

comment:5 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 0.1 to 0.15

Looking at the varnish configuration here:

https://www.transitionnetwork.org/admin/settings/varnish/general

We have "Varnish Cache Clearing" set to "Drupal Default" which is:

Drupal default will clear all page caches on node updates and cache flush events.

Jim - does this mean that the cache should be flushed when an image is added to a page?

comment:6 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 0.15 to 0.2

This ticket had 9 mins on it, just testing adding some more to test ticket:345

comment:7 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 0.2 to 0.3

The caches should NOT be flushed when an image is added as the performance impact would be huge, they should expire naturally OR with the addition of this module: http://drupal.org/project/expire

In summary:

  • images with a relative path (like /files/another-ed-cat-pic.jpg) will be in the same security context as the page -- e.g. via https when page is https, and http for http.
  • Images with hard-coded protocol and domain (like http://www.transition...) will always be displayed in that context.
  • Since user NOW are logged in under https only, all new content should be fine -- provided relative or https links are used
  • Old pages might have http references, which will cause complaints in many browsers.
  • Absolute (i.e. with http(s) bit) are required for assets like images to work with RSS feeds.

Hence everything should ideally be https, then everything is happy except our server's performance.

I don't know what to do about this, though there's various simple (like http://drupal.org/project/pathologic) and horrible (like scanning all content looking for http://www.transition... and replacing it with relative or https) things we could do.

comment:8 Changed 5 years ago by chris

  • Cc chris added
  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 0.3 to 0.5

http://drupal.org/project/pathologic looks OK but it's a input filter?

I'd have thought what is needed is an output filter on the RSS feeds to change relative URL's to http ones, eg running a regex like this on the content:

s/="\//="http:\/\/www.transitionnetwork.org\//g

Would this do the job?

http://www.nodeid.com/how-convert-all-relative-urls-absolute-urls-feed-items

comment:9 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.075
  • Total Hours changed from 0.5 to 0.575

In Drupal-speak, input filters ARE output filters... They filter a user's input, but in the Drupal world, the original content is NEVER altered, merely post-processed and cached before it's rendered.

Hence pathlogic will do what it says on the tin, and based on http://drupal.org/node/257026#example-use-cases I reckon it's the right tool for the job.

comment:10 Changed 5 years ago by chris

can pathlogic simply change the relatives URL's to absolute ones for RSS feeds -- we don't want it doing for HTML pages?

comment:11 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.575 to 0.725

Yes we do.

We want every link looked at because there are thousands of pages with absolute references in which will make the browser warn about insecure. RSS certainly needs absolute, but HTML needs relative. That's rewriting in both cases. Also, we can't guarantee users will use relative when adding links and images etc.

I'll look into the pathlogic setup, but if you get to the bottom of this issue you'll see RSS and SSL are now handled in version 3.x: http://drupal.org/node/516294

comment:12 Changed 5 years ago by jim

I'll have a play on my machine...

comment:13 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.725 to 0.875

OK, so Pathlogic handles relative URLs fine, OR absolute fine... But not one for page views and another for RSS... In fact, having looked around the Drupal issues list, this is a problem even for Drupal 8...

In a nutshell, Drupal outputs an xml base tag to tell the readers 'use http://blah...' but some RSS readers don't support it. I think it's a case of 'tough boobies, use something good' for them...

Again, will play on my machine further.

comment:14 Changed 5 years ago by ed

move into maintenance out of phase 5?

comment:15 Changed 5 years ago by chris

  • Cc laura added
  • Owner changed from chris to jim
  • Type changed from enhancement to maintenance
  • Status changed from new to assigned
  • Milestone Phase 5 deleted

Type changed to maintenance, removed from Phase 5.

comment:16 Changed 5 years ago by jim

Pathologic is on LIVE waiting to be set up...

comment:17 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Status changed from assigned to closed
  • Resolution set to needs testing
  • Total Hours changed from 0.875 to 0.975

Pathologic is now enabled on LIVE... I've configured it to be gentle with HTTP(S), so please shout if any new issues arise.

It should convert ALL local paths to relative URLs, so the HTTP(S) context won't muck up IE etc.

Note: See TracTickets for help on using tickets.