Ticket #371 (closed maintenance: fixed)

Opened 5 years ago

Last modified 4 years ago

Piwik Hosting

Reported by: chris Owned by: chris
Priority: major Milestone: Maintenance
Component: Piwik Keywords:
Cc: laura, jim, ed Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 6.22

Description (last modified by chris) (diff)

Another option to consider for Piwik would be be to use a instance on another server, and there happens to be one here that could be used:

https://stats.webarch.net/

Could this be considered?

Change History

comment:1 follow-up: ↓ 2 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 0.0 to 0.05

OOoooh a new server?! It's not Kiwi or Quince, what fruity little number have we here?

Mate, in an ideal world I (and all of us, I think) would like to see us stick with Piwik PROVIDED it doesn't impact the stressed Quince server. GA is an option, but if you're saying there's another server that can do this work then let's go for it!

Ahem... You might want to get the go-ahead from Laura, of course ;-)

comment:2 in reply to: ↑ 1 Changed 5 years ago by chris

Replying to jim:

OOoooh a new server?! It's not Kiwi or Quince, what fruity little number have we here?

sloe :-)

It's on the same physical server, but yes, it's a different virtual server, munin stats for it here:

http://nsa.rat.burntout.org/munin/webarch.net/hosting.webarch.net/index.html

comment:3 follow-up: ↓ 6 Changed 5 years ago by ed

If it's extra expense - No. We've paid all we have for hosting, and are at the edge of the web project budget as is, and anything that costs extra that isn't directly invaluable won't be done. Laura has 2.5K GBP for *all* office IT and site maintenance until end of March 2012.

Which brings up the Piwik question - which is suffering from being more emotive than technical - so we need to resolve it. In conversation recently, Chris remains keen to run piwik, but the others cannot see any advantage (other than it's not google) over GA, in fact it may be costing us for possibly less functionality and usability and more aggravation and cost (which may not be balancing with the 'not google' value).

But it's hanging on in there because we're feeling a bit guilty about it and we're not making a decision as a group. Which is silly.

SO:

  1. If piwik can be shown to support the requirements that Transition Network has, and google can't, then it is a better choice. If not, it's not and we can't afford it right now.

THEREFORE:

  1. Transition Network to provide requirements. These will be ongoing on the website, and forthcoming from the PSE project. Users in TN are already needing to set up their own custom URL tracking for their campaigns and this is only going to grow.

NEXT STEPS:

  1. In PSE project, Ed to set up 'Tracking requirements' ticket, in which to add TN requirements. This will come out over January once we've done the design bit, at which point we make a decision.

AGREED?

comment:4 follow-up: ↓ 5 Changed 5 years ago by jim

Agree 100% Ed. I assumed Chris was offering a service on Sloe for no extra cost.

Piwik could be disabled with a tick of a Drupal box (and a service piwik stop on LIVE). Then, thanks to less overhead -- and the fact we now have a reasonable amount of free memory according to Munin -- the number PHP processes can be increased a little bit to give the server more headroom, increasing web throughput.

If we really need stats for PSE and GA can't do them, we can revisit Piwik as you suggest.

So, shall I disable Piwik for now?

comment:5 in reply to: ↑ 4 Changed 5 years ago by chris

Replying to jim:

Agree 100% Ed. I assumed Chris was offering a service on Sloe for no extra cost.

It's 1Gb of MySQL data at the moment so of course it has an associated hosting cost, (£1.50 pcm based on the rates here http://webarch.net/hosting#uk), however we are discussing offering free pwiki accounts in addition to free email lists to all co-op members so perhaps this could be a way forwards?

comment:6 in reply to: ↑ 3 Changed 5 years ago by chris

Replying to ed:

If it's extra expense - No.

The cost would be a few pounds per month.

Don't delude yourself that GA is free -- there is a cost to Google (and the planet) in terms of disk space, server resources and energy, they don't charge for this as they make more money from selling users data to advertisers, they are shrewd, they know people think they are getting a free service, but they also know that they can sell more, better targeted, adverts if they have more data about peoples web usage.

What value does the Transition Network put on it's users privacy, is giving it away to unknown advertisers, in order to save a few quid, a price it's prepared to pay?

comment:7 follow-up: ↓ 8 Changed 5 years ago by jim

What value does the Transition Network put on it's users privacy, is giving it away to unknown advertisers, in order to save a few quid, a price it's prepared to pay?

For me, yes.

I completely understand and respect your argument, and largely accept it, Chris. But what we don't have is Google's economies of scale, nor software/hardware expertise, nor anywhere near their level of cash. Given these points, their service IS (effectively) free and WILL be more efficient if they run it with huge economies of scale, when viewed both from a server infrastructure and an environmental level.

Let's not kid ourselves that web stats are that important; they are not. They're very helpful occasionally but, when compared to other server features, are a largely disposable commodity.

And let's not kid ourselves that GA is particularly evil... It gets anonymous data and the Drupal plugin is well behaved and peer-reviewed. People can opt out, either from their profiles, or by setting the new "Do not track" feature if their browser supports it.

Finally, the server IS getting busier over time, which will mean the relative burden of statistics collection will rise with page views. So what works 'ok' today, might be generating 2x the slow queries and IO overhead in 6 month's time (for example).

So, to draw a line under my comments and close - given these choices:

  • a slower and less resilient site while 'doing the right thing';
  • 'doing the right thing' and adding a not-insignificant hosting cost largely for that reason alone;
  • giving Google Analytics anonymous data and being _slightly_ 'evil', but freeing ourselves of direct costs;

GA is obviously the lesser of three evils. And if it were my choice, it'd be a clear and easy decision.

comment:8 in reply to: ↑ 7 Changed 5 years ago by chris

Replying to jim:

  • giving Google Analytics anonymous data

But it's not anonymous data we are talking about here -- perhaps this is the crux of the issue?

comment:9 follow-up: ↓ 10 Changed 5 years ago by jim

Perhaps the crux of the issue for you, Chris ;-) As I said, I just want a performant, low-cost website. Bearing in mind GA receives only the first 3 quads of an IP (because this setting in the Drupal module is turned on), I suppose with a monumental level of cross-referencing of various statistics and ad cookies etc, it might be possible for Google to de-anonymise the data - but is it likely, cost effective or good business sense for them? Doubt it.

Do you have any links to support your assertion? I'll read with interest, but won't discuss this any further on Trac as this is not a place for debating...

comment:10 in reply to: ↑ 9 Changed 5 years ago by chris

Replying to jim:

Perhaps the crux of the issue for you, Chris ;-)

You appear to misunderstand -- you appear to believe that GA only receives "anonymous data" -- this is simply not true.

GA receives only the first 3 quads of an IP (because this setting in the Drupal module is turned on)

I didn't realise there was that option in the module settings, however the GET that GA receives will come from the clients IP address, so Google does still get their IP address, irrespective of this setting.

Replying to jim:

And let's not kid ourselves that GA is particularly evil... It gets anonymous data and the Drupal plugin is well behaved and peer-reviewed. People can opt out, either from their profiles, or by setting the new "Do not track" feature if their browser supports it.

  1. It's clearly not "anonymous data" -- it's site users IP addresses, their (more often than not) unique browser fingerprint (see https://panopticlick.eff.org/ ) and for people who login, their name, as they get redirected to the "My account" page which has it in the URL (even though the "My account" page doesn't have the GA tracking bug users only need to follow a link to a page that does and then their username is handed over via the HTTP Referer header).
  1. "People can opt out, either from their profiles", I see this is enabled in the modules but as a user I can't find the like to this option. In any case shouldn't handing over ones usage data to a corporation be a opt-in not a optional opt-out?
  1. "or by setting the new "Do not track" feature", when I looked into this I found no evidence that GA respects this -- has anyone tested it? Also I have the Do Not Track setting in my browser but I still see the GA web bug on TN pages.

comment:11 Changed 5 years ago by jim

1 & 3) Check the settings here: https://www.transitionnetwork.org/admin/settings/googleanalytics and the module page here: http://drupal.org/project/google_analytics

2) My bad, the permission wasn't set. Registered users get a "Google Analytics configuration" field set with checkbox on their user edit page.

comment:12 Changed 5 years ago by ed

  1. Ethically, we'd all be happier without google's service, but pragmatically we may not be able to afford it.
  1. If it's a few quid a month; that's relatively small. TN would also be interested in the free account and email lists. This is budget and maintenance stuff; which is now Laura's role to decide.
  1. The other unknown is *performance*. When the site is slow, how much of that is piwik? We have presumed this until now; I'm sure we've agreed to work it out. Please provide details of the performance load so that we can assess it (Chris? Jim?).

Working on the site in what is your nighttime is very illuminating; people in different timezones will be getting punished (never a requirement I understand).

  1. Then we will assess the tracking and reporting requirements #382 in January and see if piwik can do it. If not, we remove it. If so, and it's not affecting our service (C above), we keep it and ditch google.

So for now, assuming piwik's not dead in the water, we keep tracking with it.

comment:13 follow-up: ↓ 15 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.05 to 0.2

OK, after discussion with Ed & Laura it was agreed to test disabling Piwik for 48 hours to answer theperformance impact question.

I've just done this today (20 Dec 2011) by doing these tasks (times):

  • (22.19) disabled Piwik Drupal module, caches cleared
  • (22.30) - varnish restarted (to clear pages referring to Piwik from its cache)
  • (22.36) - commented out line in www-data's cron (crontab -u www-data -e)

So the reverse (ignoring the varnish restart) will reinstate Piwik. We can compare various server loads so we can finally answer the question of impact on the server.

I'll reinstate Piwik at (or just after) 11pm Thur 22nd.

comment:14 Changed 5 years ago by ed

... Further to PSE skype, this agreed:

JK to turn piwik off for 48 hours from now
JK to turn piwik on again after 48 hours
All to wait at least 48 hours
JK/CC to review munin to assess effect piwik has on servers load and processes

Ed and Laura to get the mgt needs out in #382

comment:15 in reply to: ↑ 13 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 0.2 to 0.3

Replying to jim:

  • (22.30) - varnish restarted (to clear pages referring to Piwik from its cache)

Piwki doesn't use Varnish since it's all running on port 443.

As I have said before I think there is too much general noise for this to be a meaningful test and I think doing it in a abnormal week (the last week before Xmas) is just going to make the results even less meaningful.

I don't think it's worth doing these things since I don't think this experiment is going to produce any data which is of any use, but without doing them it's going to be even less meaningful:

  1. Delete the apache config for Piwik -- the application is currently still available: https://stats.transitionnetwork.org/
  2. Delete the Piwik files.
  3. Delete the Piwik database.

comment:16 follow-up: ↓ 18 Changed 5 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.3 to 0.55
  1. Varnish would have copies of pages that contained the js tracking code added by the Drupal Piwik module, so it certainly needed clearing to avoid old page copies requesting from Piwik.
  2. There is already a difference I can see in absent CPU load spikes (from IO wait, User and Steal), MySQL slow queries and the increased proportion of memory used by buffers and caches. There are also fewer Apache accesses and interrupts for the last ~48 hours, though these are most likely due to there being less traffic (according to GA) recently. So there are not huge differences, but they're clearly there. I saved a copy of the Munin 24 hour charts yesterday showing the difference, thought there's there on the weekly charts.
  3. If Piwik is not being called (or hardly being called) and the cron job is not aggregating the CSV it generates -- which was a big source of slow queries/CPU and IO usage -- then the unused database will fall out of memory/cache and the files just sitting there are irrelevant to performance. The Apache config could be commented out, but since there shouldn't be any calls to Piwik (apart from a few very old caches outside the site), there's not much point. Sure, it'd be better to remove them too, but if it's hardly being called it's

Preliminary results: as expected Piwik impacts on the database, disk IO and especially CPU (probably due to the data aggregation/warehousing cron job). Not a huge amount, sure, but by a small detectable amount. We should bear this in mind for future discussions about performance, and certainly move Piwik to a separate server ASAP if we want to continue using it, as I think we do.

I'll re-instate Piwik this later tonight when our guests leave.

comment:17 Changed 5 years ago by jim

I've enabled the Piwik module and cron task, so we're back as before.

comment:18 in reply to: ↑ 16 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 0.55 to 1.55

Replying to jim:

  1. There is already a difference I can see in absent CPU load spikes (from IO wait, User and Steal), MySQL slow queries and the increased proportion of memory used by buffers and caches.

I spent quite a bit of time looking at these and have saved a load of image I could upload these if needs be.

The only notable thing I could see was that the 'php' activity dropped to near zero on this graph:

https://kiwi.transitionnetwork.org/munin/webarch.net/quince.webarch.net/multimemory.html

And of course this was to be expected as it's only the cli php usage and this was specifically added to this graph for the sole purpose of tracking the memory usage of the Piwik cron job.

there are not huge differences, but they're clearly there.

Of course Piwik uses some resources, there has never been any doubt about this, I still maintain that the resources used are not so great as to justify it being dropped.

  1. If Piwik is not being called (or hardly being called) and the cron job is not aggregating the CSV it generates -- which was a big source of slow queries/CPU and IO usage -- then the unused database will fall out of memory/cache and the files just sitting there are irrelevant to performance. The Apache config could be commented out, but since there shouldn't be any calls to Piwik (apart from a few very old caches outside the site), there's not much point. Sure, it'd be better to remove them too, but if it's hardly being called it's

The point was that the main problem has been with the nightly backups, so removing 1Gb of mysql data and the static files would make some difference to that.

The decision about using Piwik or GA is not a technical one -- it's a political one -- it's akin to a decision like where to source food for a Transition Network conference, McDonalds or a local CSA project / veggie cafe co-op.

Does the Transition Network want to give it's users usage data to one of the planets richest corporations (Google made $8.5 billion profit in 2010) so they can (mis)use this data to make money?

Or does the Transition Network want to keep it's users usage data private and support free open source software?

comment:19 follow-up: ↓ 20 Changed 5 years ago by ed

Thanks both for your work on this.

The decision is a technical, financial and ethical one. As well as that it has to be a pragmatic and quite possibly paradoxical one; where Transition Network can retain private data using open source software it will, and where it can't, it won't. When Transition Network can afford to move to Iceland, it will, until then, it can't. Maybe bits of the web service will be in different servers for some time depending on budgets, etc. etc.

It seems that the performance issues of running piwik are not that great, and that the problems we had were the nightly backups. Therefore this experiment has shown Piwik is not a significant performance problem - slight overload, yes, but not in my opinion, one that outweighs the ethical issue.

Piwik came up in that discussion as it has been a latent issue we've not discussed properly as a group; that is my responsibility - sorry.

Ed and Laura are still to clearly outline the measurement and reporting requirements #382 - then we'll know how Piwik balances with GA in that requirement.

Thanks for your ongoing patience and well mannered debate.

comment:20 in reply to: ↑ 19 Changed 5 years ago by chris

Replying to ed:

Piwik is not a significant performance problem - slight overload, yes, but not in my opinion, one that outweighs the ethical issue.

I agree, there is also the option of free Piwik hosting for Webarch Co-op members.

Ed and Laura are still to clearly outline the measurement and reporting requirements #382 - then we'll know how Piwik balances with GA in that requirement.

One thing to consider in this is that Pwiki can be used to measure things that we might not want GA to measure, and in fact we are doing this already -- GA is excluded from tracking user/*/* pages for privacy reasons (though as discussed previously the effectiveness of this is debatable) -- the same privacy issues don't arise with Piwik since the user data isn't shared with a external organisation.

comment:21 follow-up: ↓ 22 Changed 5 years ago by jim

Two points then I'm done here...

  • Agree with Chris that Piwik vs GA is political, never been in doubt. The doubt stems (and always has for me) on having Piwik run on an already busy/slow/full VPS.
  • But Chris is underplaying the visible effects of disabling Piwik. The INCREASED memory use (probably MySQL cache) alone proves we're pushing good stuff out of memory with every 5 minute Piwik cron run - and these coincide with little spike in CPU and IO use. I can provide a PDF with these annotated graphs, but I'd rather not cost myself or TN that time!

Again for clarity: I'm saying Piwik should not be hosted on LIVE IF it continues to get busier.

Imagine Piwik is a 1-3 out of a 100 for the server in terms of overall costs, spiking every 5 minutes to 5-15 briefly as it batch processes what it's stored. That's fine, we can handle that, but - given the bloated Apache setup and the fact these 5 minute batches are pushing good caches out of memory - when we're running close to capacity Piwiks costs to the server will grow disproportionally with its 'normal' use, because everything starts to wait on memory and disk access bottlenecks.

So I'll leave it there, we should move Piwik elsewhere if possible - and especially if peak server load gets dangerously high. But otherwise there are several good things left to do on #369 which may get good results, especially moving to NGINX (the more I read about/use this the better it looks vs Apache) and MySQL tuning (been doing this on my server recently).

Ooh - last question... Can't we get Piwik to process hourly instead? Or at night? We don't need real-time tracking or quick analysis, so perhaps we change these settings.

comment:22 in reply to: ↑ 21 Changed 5 years ago by chris

Replying to jim:

we're pushing good stuff out of memory with every 5 minute Piwik cron run

The cron run is, and has always, been set to run at 5mins past every hour, not every 5mins:

crontab -e -u www-data

# m h  dom mon dow   command
5 * * * * /web/stats.transitionnetwork.org/piwik/misc/cron/archive.sh > /dev/null

See: https://en.wikipedia.org/wiki/Crontab

This cron run corresponds to the 'php' purple on this graph: https://kiwi.transitionnetwork.org/munin/webarch.net/quince.webarch.net/multimemory.html

I agree this could be changed to every 2 or 3 hours or even once or twice a day, if the people looking at the stats are happy with the graphs being updated less often.

comment:23 Changed 5 years ago by jim

Aha, thanks! yes I meant 5 past hourly...

I reckon running this cron batch job twice or even once a day would be fine... And since the Piwik data isn't mission critical, could we not back it up less than nightly to take the load off a bit? A nice, quiet, early Sunday morning for a weekly backup would be fine, I reckon...

comment:24 follow-up: ↓ 25 Changed 5 years ago by ed

Ok - therefore:

Reset Piwik to run as infrequently as possible, daily is fine by me, and at the quietest times. Management won't need anything like real-time or speedy-time stats; I only did them once per month and very much doubt anyone ever looked at my reports - Laura?

(Although this may change with the PSE's widgets)

comment:25 in reply to: ↑ 24 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.55 to 1.65

Replying to ed:

Reset Piwik to run as infrequently as possible, daily is fine by me, and at the quietest times.

OK, it's now set to generate the graphs at 5 past midnight each day -- by default it displays the graphs for yesterday.

comment:26 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.12
  • Total Hours changed from 1.65 to 1.77
  • Description modified (diff)

wiki:PiwikServer Piwik is now running on wiki:PenguinServer using Nginx.

I think we need to revisit this ticket at some point and consider:

  • Switching off GA on the transitionnetwork.org site and also the archive (and also various wordpress sites?).
  • Adding Piwik webbugs to all the sites that people would like stats for, using plugins which also allow admins to access stats, for example https://wordpress.org/extend/plugins/wp-piwik/
  • Partially anonomising the Piwik data by not saving the first part of the IP address (this is a simple setting to switch on in Piwik).
  • Adding more user accounts to https://stats.transitionnetwork.org/ so people who need to know how much traffic sites are getting can access the data, encourage Android users to install the app http://piwik.org/mobile/
  • Consider offering this as a service to other transition sites.
Version 0, edited 4 years ago by chris (next)

comment:27 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 1.77 to 2.07

I have added a list of site that have pwiki stats generated and also some notes about how to add new sites to the wiki:PiwikServer#Sites page.

comment:28 Changed 4 years ago by ed

this for ed chris meetig 19/02/13

comment:29 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.5
  • Total Hours changed from 2.07 to 3.57

At the meeting with Ed on 19/02/13 we spent around an hour on Piwik and:

Ed - anything else I have forgotten?

Today I have spent some time archiving the old information on the wiki:PiwikServer page onto the PiwikServerInstall notes page and also making the wiki:PiwikServer more user-friendly and upto date, including generating lists of the the outstanding and closed tickets, see wiki:PiwikServer#Tickets.

I think most the items I listed on ticket:371#comment:26 have now been addressed, one further thought, is that perhaps the information on the wiki:PiwikServer page, specifically the wiki:PiwikServer#Accessingthestats and wiki:PiwikServer#Sites sections, should be copied to http://wiki.transitionnetwork.org/ as this information isn't just for Transition Technologists?

I still need to address the error I reported on ticket:477.

comment:30 follow-up: ↓ 31 Changed 4 years ago by ed

That seems to be it.

  1. Privacy issue and cookies: my understanding was (Chris) would put some text together? This on #258
  2. Moving to piwik - we agreed to move off GA in March if piwik was OK and it seems OK - shall we set a date?

comment:31 in reply to: ↑ 30 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 3.57 to 3.82

Replying to ed:

  1. Privacy issue and cookies: my understanding was (Chris) would put some text together? This on #258

OK, I have read through ticket:258 and updated https://wiki.transitionnetwork.org/Privacy#Draft_New_Privacy_Text will follow up on ticket:258.

  1. Moving to piwik - we agreed to move off GA in March if piwik was OK and it seems OK - shall we set a date?

Sure, sorry we have missed 1st March for this, 1st April?!

comment:32 Changed 4 years ago by ed

  1. Date: I'm a fan of silly dates, but there's no need to hang around. I say Wednesday 6th March - gives Jim a chance to discuss it in Ttech meet, then remove the module, and both of you to add the work in the invoices we'll run next week.

comment:33 Changed 4 years ago by ed

GA turned off during ttech call. Chris to do write ups as agreed above.

comment:34 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.1
  • Total Hours changed from 3.82 to 4.92

Working out what cookies we set...

I created a new user account on my computer and using the latest Firefox with only the Live HTTP headers add on installed requested the network site.

GET /

I visited http://www.transitionnetwork.org/ and a cookie was set by wiki:PiwikServer but not by the wiki:PuffinServer server, following are the GET headers and the headers returned for the front page, no cookies were set on any HTML, CSS, images or javascript:

GET / HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 13 Mar 2013 13:34:41 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding, Cookie
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
Cache-Control: public, max-age=0
Last-Modified: Wed, 13 Mar 2013 13:18:55 +0000
Expires: Sun, 11 Mar 1984 12:00:00 GMT
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: EXPIRED
X-Speed-Cache-Key: /
X-NoCache: Cache
X-GeoIP-Country-Code: GB
X-GeoIP-Country-Name: United Kingdom
X-Server-Name: www.transitionnetwork.org
Content-Encoding: gzip

GET /piwik.js

The initial wiki:PiwikServer GET and response didn't set a cookie:

GET /piwik.js HTTP/1.1
Host: stats.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: */*
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.transitionnetwork.org/
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx/1.2.7
Date: Wed, 13 Mar 2013 13:31:25 GMT
Content-Type: application/x-javascript; charset=utf-8
Content-Length: 21596
Last-Modified: Fri, 08 Mar 2013 13:52:30 GMT
Connection: keep-alive
Accept-Ranges: bytes

GET piwik.php

The following Piwik GET, which happens after the browser has loaded the Javascript requested above, sets a cookie:

GET /piwik.php?action_name=Welcome%20%7C%20Transition%20Network&idsite=1&rec=1&r=461589&h=13&m=27&s=51&url=http%3A%2F%2Fwww.transitionnetwork.org%2F&_id=d7d58656cadf1014&_idts=1363181271&_idvc=1&_idn=1&_refts=0&_viewts=1363181271&pdf=0&qt=1&realp=0&wma=1&dir=0&fla=1&java=0&gears=0&ag=0&cookie=1&res=2048x1536 HTTP/1.1
Host: stats.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.transitionnetwork.org/
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx/1.2.7
Date: Wed, 13 Mar 2013 13:31:25 GMT
Content-Type: image/gif
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.3.22-1~dotdeb.0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: no-cache
X-Piwik-Long-Cache: MISS

POST /

I then entered sheffield in the search box and submitted the form, this uses a POST, these are the headers submitted, note the two cookies set by the Piwik Javascript, _pk_id.1.2dc7 and _pk_ses.1.2dc7:

POST / HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.transitionnetwork.org/
Cookie: _pk_id.1.2dc7=d7d58656cadf1014.1363181271.1.1363181271.1363181271.; _pk_ses.1.2dc7=*
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 121
search_theme_form=sheffield&op.x=11&op.y=14&form_build_id=form-8784713f6e2bd7994b108bf4b6ddfb4a&form_id=search_theme_form

The server then returned a 302 Redirect and set a session cookie, OctopusNoCacheID with an expiry date 19 seconds in the future:

HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Wed, 13 Mar 2013 13:44:05 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
Set-Cookie: OctopusNoCacheID=POSTBOND3ce900a146f2bc88a7f791716c4a7b36; expires=Wed, 13-Mar-2013 13:44:19 GMT; path=/; domain=.transitionnetwork.org
Last-Modified: Wed, 13 Mar 2013 13:44:04 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
Etag: "1363182244"
Location: http://www.transitionnetwork.org/search/node/sheffield
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache-Key: /
X-NoCache: Method
X-GeoIP-Country-Code: GB
X-GeoIP-Country-Name: United Kingdom
X-Server-Name: www.transitionnetwork.org

GET /search/node/sheffield

Following the instruction that the search results have moved, in the Location: field above, Firefox does another GET and 3 cookies are sent with this request, the two Piwik ones and the OctopusNoCacheID one:

GET /search/node/sheffield HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.transitionnetwork.org/
Cookie: _pk_id.1.2dc7=d7d58656cadf1014.1363181271.1.1363181271.1363181271.; _pk_ses.1.2dc7=*; OctopusNoCacheID=POSTBOND3ce900a146f2bc88a7f791716c4a7b36
Connection: keep-alive

All the requests for CSS, images and Javascript then were submitted with these 3 cookies.

GET /piwik.php

As part of the request for the search results page another request is sent to the Piwik server, note that there is no Piwik cookie sent, Piwik sets cookies on the www.transitionnetwork.org site but not on stats.transitionnetwork.org, however as Octopus set a cookie domain of .transitionnetwork.org this means that request to any *.transitionnetwork.org site will be sent this cookie, even though no other site will have a use for it:

GET /piwik.php?search=sheffield&search_count=36&idsite=1&rec=1&r=613466&h=13&m=37&s=17&url=http%3A%2F%2Fwww.transitionnetwork.org%2Fsearch%2Fnode%2Fsheffield&urlref=http%3A%2F%2Fwww.transitionnetwork.org%2F&_id=d7d58656cadf1014&_idts=1363181271&_idvc=1&_idn=0&_refts=0&_viewts=1363181271&pdf=0&qt=1&realp=0&wma=1&dir=0&fla=1&java=0&gears=0&ag=0&cookie=1&res=2048x1536 HTTP/1.1
Host: stats.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.transitionnetwork.org/search/node/sheffield
Cookie: OctopusNoCacheID=POSTBOND3ce900a146f2bc88a7f791716c4a7b36
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx/1.2.7
Date: Wed, 13 Mar 2013 13:40:51 GMT
Content-Type: image/gif
Transfer-Encoding: chunked
Connection: keep-alive
X-Powered-By: PHP/5.3.22-1~dotdeb.0
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: no-cache
X-Piwik-Long-Cache: MISS

POST /user/login

I then went to the login page at https://www.transitionnetwork.org/user/login and entered my details and submitted the form:

POST /user/login HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.transitionnetwork.org/user/login
Cookie: _pk_id.1.2dc7=d7d58656cadf1014.1363181271.2.1363184892.1363181837.; _pk_ses.1.2dc7=*
Connection: keep-alive
Content-Type: application/x-www-form-urlencoded
Content-Length: 128
name=chris%40webarchitects.co.uk&pass=atriumsux&form_build_id=form-937f2d15c0edb210d570422b3d06baf8&form_id=user_login&op=Log+in

This returns a 302 redirect which set a authenticated session cookie, SESS9c2fa34093f48e760e86f54157929612, note that this is a secure cookie, but it also has the domain set so that it will be sent to any site running with HTTPS on a *.transitionnetwork.org domain and does another Redirect:

HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Wed, 13 Mar 2013 14:35:33 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
X-Local-Proto: https
Set-Cookie: OctopusNoCacheID=NOCACHEBOND962e934d376dbdd7beaf097d2fb5f612; expires=Wed, 13-Mar-2013 14:35:47 GMT; path=/; domain=.transitionnetwork.org
Set-Cookie: OctopusNoCacheID=POSTBOND962e934d376dbdd7beaf097d2fb5f612; expires=Wed, 13-Mar-2013 14:35:47 GMT; path=/; domain=.transitionnetwork.org
Set-Cookie: SESS9c2fa34093f48e760e86f54157929612=f609c04475bc457c51d43d097f5c3b37; expires=Thu, 14-Mar-2013 14:35:33 GMT; path=/; domain=.transitionnetwork.org; secure; HttpOnly
Last-Modified: Wed, 13 Mar 2013 14:35:32 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
Etag: "1363185332"
Location: https://www.transitionnetwork.org/users/chris-croome
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache-Key: /user/login
X-NoCache: Skip
X-This-Proto: https
X-Server-Name: www.transitionnetwork.org
Vary: Accept-Encoding

GET /users/chris-croome

After login you redirected to your user page, so there is a GET for it:

GET /users/chris-croome HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.transitionnetwork.org/user/login
Cookie: _pk_id.1.2dc7=d7d58656cadf1014.1363181271.2.1363184892.1363181837.; _pk_ses.1.2dc7=*; OctopusNoCacheID=POSTBOND962e934d376dbdd7beaf097d2fb5f612; SESS9c2fa34093f48e760e86f54157929612=f609c04475bc457c51d43d097f5c3b37
Connection: keep-alive

And the response sets a further cookie, LOGGED_IN, this one isn't secure, it's used to redirect requests for HTTP pages to HTTPS pages to avoid the confusion for people if they login and then visit a HTTP page and find they are not logged in:

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 13 Mar 2013 14:35:35 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Vary: Accept-Encoding, Accept-Encoding
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
X-Local-Proto: https
X-Accel-Expires: 1
Set-Cookie: LOGGED_IN=1; path=/; domain=.transitionnetwork.org
Last-Modified: Wed, 13 Mar 2013 14:35:33 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: BYPASS
X-Speed-Cache-UID: f609c04475bc457c51d43d097f5c3b37
X-Speed-Cache-Key: /users/chris-croome
X-NoCache: Skip
X-This-Proto: https
X-Server-Name: www.transitionnetwork.org
Content-Encoding: gzip

GET /sites/all/modules/contrib/admin_menu/admin_menu.css

All subsecurent requests have 5 cookies, the two Pwiki ones, the Octopus one, the session cookie and the LOGGED_IN cookie:

GET /sites/all/modules/contrib/admin_menu/admin_menu.css?i HTTP/1.1
Host: www.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: https://www.transitionnetwork.org/users/chris-croome
Cookie: _pk_id.1.2dc7=d7d58656cadf1014.1363181271.2.1363184892.1363181837.; _pk_ses.1.2dc7=*; OctopusNoCacheID=POSTBOND962e934d376dbdd7beaf097d2fb5f612; SESS9c2fa34093f48e760e86f54157929612=f609c04475bc457c51d43d097f5c3b37; LOGGED_IN=1
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx
Date: Wed, 13 Mar 2013 14:35:36 GMT
Content-Type: text/css
Transfer-Encoding: chunked
Connection: keep-alive
Last-Modified: Mon, 18 Feb 2013 10:57:08 GMT
Vary: Accept-Encoding, Accept-Encoding, Accept-Encoding
Expires: Thu, 31 Dec 2037 23:55:55 GMT
Cache-Control: max-age=315360000
Content-Encoding: gzip

Next time I revisit this ticket I'll enable the Do Not Track option in the browser and look at the results.

I also need to write up the above in a short paragraph!

Note that I logged out before posting the headers above as the secure session cookie could be used to login to the site by a third party.

comment:35 Changed 4 years ago by ed

brilliant! a paragraph summary will be excellent.

comment:36 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.2
  • Total Hours changed from 4.92 to 6.12

Doing some more testing with the DNT header, I cleared out all the cookies in the browser and I enabled the "Tell sites I do not want to be tracked" and loaded the front page.

GET /

GET / HTTP/1.1
Host: transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 15 Mar 2013 13:21:57 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
Vary: Accept-Encoding, Cookie
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
Cache-Control: public, max-age=0
Last-Modified: Fri, 15 Mar 2013 13:21:55 +0000
Expires: Sun, 11 Mar 1984 12:00:00 GMT
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: MISS
X-Speed-Cache-Key: /
X-NoCache: Cache
X-GeoIP-Country-Code: GB
X-GeoIP-Country-Name: United Kingdom
X-Server-Name: www.transitionnetwork.org
Content-Encoding: gzip

No cookie was set by the front page or any of the images, CSS or Javascript. I then clicked on http://transitionnetwork.org/support/what-transition-initiative and one of the requests did result in a cookie being set from transitionsc.org (see ticket:519 and note that the time to write up that ticket is included on this ticket) and also YouTube set some cookied on the embedded content from their servers.

I then clicked the login link.

GET /user/login

Following a link from the HTTP site to the login pages results in a redirect to HTTPS and a OctopusNoCacheID cookie being set:

GET /user/login?destination=support/what-transition-initiative HTTP/1.1
Host: transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://transitionnetwork.org/support/what-transition-initiative
Connection: keep-alive

HTTP/1.1 302 Moved Temporarily
Server: nginx
Date: Fri, 15 Mar 2013 13:54:10 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: close
X-Cookie-Domain: .transitionnetwork.org
X-Backend: C
X-Allow-Redis: YES
X-Purge-Level: 6
Set-Cookie: OctopusNoCacheID=NOCACHEBOND61cd8a41d7c0b62a0ddcd1073227df74; expires=Fri, 15-Mar-2013 13:54:25 GMT; path=/; domain=.transitionnetwork.org
X-Accel-Expires: 1
Last-Modified: Fri, 15 Mar 2013 13:54:10 +0000
Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0
Etag: "1363355650"
Location: https://transitionnetwork.org/user/login?destination=support/what-transition-initiative
X-Engine: Octopus 1.0 ET
X-Device: normal
X-Speed-Cache: BYPASS
X-Speed-Cache-Key: /user/login?destination=support/what-transition-initiative
X-NoCache: Skip
X-GeoIP-Country-Code: GB
X-GeoIP-Country-Name: United Kingdom
X-Server-Name: www.transitionnetwork.org

The cookie set above was returned to the server with all the subsequent requests for images, CSS and Javascript.

GET /piwik.js

The request for the Pwiki web tracking Javascript:

HTTP/1.1
Host: stats.transitionnetwork.org
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:19.0) Gecko/20100101 Firefox/19.0
Accept: */*
Accept-Language: en-gb,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: https://transitionnetwork.org/user/login?destination=support/what-transition-initiative
Cookie: OctopusNoCacheID=NOCACHEBOND61cd8a41d7c0b62a0ddcd1073227df74
Connection: keep-alive

HTTP/1.1 200 OK
Server: nginx/1.2.7
Date: Fri, 15 Mar 2013 13:50:46 GMT
Content-Type: application/x-javascript; charset=utf-8
Content-Length: 21596
Last-Modified: Fri, 15 Mar 2013 12:30:44 GMT
Connection: keep-alive
Accept-Ranges: bytes

Didn't result in a follow up request as happened when the DNT header wasn't set -- this is clearly working correctly.

Summary

I have written up what cookies we set and what they are used for here:

This is the current text on that wiki page:

Cookies

The Transition Network site uses cookies for 3 purposes:

  • Tracking, if you haven't opted out of Piwki tracking and / or you haven't set the "Dop Not Track" option in your web browser and if you don't have a plugin that blocks javascript and / or web trackers then a pair of Piwik cookies will be set in your browser to identify all requests you make to the site.
  • Performance, one of the application we use to deploy the site, (Octopus) uses a cookie to manage how long objects are cached for.
  • Authentication, when you login to the site a cookie is set to say that you have been authenticated, this cookie is secure and is only transmitted via HTTPS. A further insecure cookie is set to identify that you have been authenticated and this is use to redirect you from the HTTP site to the HTTPS one, this is done to prevent the potential for confusion.

The only cookies that are essential for the functionality of the site are the authentication ones, if these are blocked then you will be unable to do things that authenticated user can do, for example posting comments in the forum.

Attentional cookies are set by third party sites on things like embedded images, videos and maps.

comment:37 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Status changed from new to closed
  • Resolution set to fixed
  • Total Hours changed from 6.12 to 6.22

Ed has updated the privacy policy, see ticket:258#comment:36 so this ticket can also also be closed.

I have changed the listing of closed Piwik tickets so we have a listing of their titles here to make things easier to find:

comment:38 Changed 4 years ago by chris

  • Milestone set to Maintenance
Note: See TracTickets for help on using tickets.