Ticket #160 (closed enhancement: fixed)

Opened 6 years ago

Last modified 5 years ago

install and configure piwik (not awstats)

Reported by: chris Owned by: chris
Priority: critical Milestone: Phase 4
Component: Dev server Keywords:
Cc: ed, jim Estimated Number of Hours: 4.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 9.65

Description

http://awstats.sf.net/ for wiki, static and www apache logs and also exim mail logs.

Change History

comment:1 Changed 6 years ago by chris

  • Component changed from Drupal modules & settings to Dev server

comment:2 Changed 6 years ago by ed

  • Milestone set to Phase 4

comment:3 Changed 5 years ago by chris

This might be a better option -- easier to stats read for non-tech people?

Piwik aims to be an open source alternative to Google Analytics, and is already used on more than 150,000 websites.

http://piwik.org/

comment:4 Changed 5 years ago by ed

your call :)

comment:5 Changed 5 years ago by chris

I think you need to decide, I could always install both though, take a look at the online demos:

http://www.nltechno.com/awstats/awstats.pl?config=destailleur.fr

http://demo.piwik.org/

I think awstats is fine for people used to reading raw log files, piwik is probably best for people used to using google stats?

comment:6 Changed 5 years ago by chris

A couple of other thoughts, we could install piwik on the dev server since all tracking is done via some javascript code to add to web pages, this would mean that it would not add additional load to the live server.

Also since piwik doesn't use the server logs people with javascript disabled won't generate any stats, whereas awstats would pick this info up (also file downloads and inages won't be tracked by piwik).

So it's swings and roundabouts, piwik does allow you generate loads of reports and graphs...

Finally I don't think the web server is sending out the newsletters? If it is then I could look at generating mailstats with awstats.

comment:7 Changed 5 years ago by chris

Also there is a piwik drupal module: http://drupal.org/project/piwik

comment:8 Changed 5 years ago by chris

A piwik.transitionnetwork.org server could also be used to track multiple sites: we could even offer it as a service for other transition sites to use as an alternative to Google.

Existing stats can be imported:

http://piwik.org/blog/2011/02/exporting-google-analytics-to-piwik-google2piwik/

There is also an option with the Drupal plugin where we could allow user to opt out of tracking, see this screenshop:

http://drupal.org/node/951246

One further consideration: a Piwik install would be more work than a awstats install, and I would suggest if we go for it we set it up as piwik.transitionnetwork.org and consider which server to run it on.

comment:9 Changed 5 years ago by jim

My 2p: Never liked AWStats because it's sooo 1996. I like the look of Piwik, seems to be a true alternative to GA - much more modern and capable.

As Chris says, Piwik also can (and probably should) be hosted on another server, which in the longer term gives us more flexibility. Plus we can import from GA, which is handy.

My vote: Piwik.

Stupid name though...

comment:10 Changed 5 years ago by ed

if anyone here thinks, after that list of excellent benefits, that we shouldn't use piwik, they should get their head examined. My vote too.

comment:11 Changed 5 years ago by chris

  • Cc jim added

OK Piwik it is then!

I suggest we set it up on the dev server to start with to avoid additional load on the live server.

Jim -- are you OK doing the Drupal end of this if I do the Piwik end?

comment:12 Changed 5 years ago by jim

Yes, the module will arrive with the next updates - should be next week I'd imagine.

comment:13 Changed 5 years ago by ed

  • Priority changed from minor to critical
  • Type changed from defect to enhancement

comment:14 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.5
  • Status changed from new to accepted
  • Total Hours changed from 0.0 to 1.5

I have done an initial install here: https://piwik.transitionnetwork.org/

And started to document it here: wiki:PiwikServer

comment:15 Changed 5 years ago by jim

Drupal module now in SVN, waiting on #263...

comment:16 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.5 to 1.6

Can we start by testing Piwik on the dev server? I have added it as a site to track on https://piwik.transitionnetwork.org/ -- is the Drupal module installed on the dev server?

Also should I add tracking for the wiki.transitionnetwork.org site?

comment:17 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.6 to 1.7

For sorting the Google Analytics data import we need a "Google Analytics Account with read or admin rights." and also "Google Apps for domain users – Google Analytics API currently does not support Google Apps for your Domain Accounts. It is not possible to export data from account@… even if you have access via web interface. However you can grant privileges to your Gmail account, and use it to perform the export."

http://piwik.org/blog/2011/02/exporting-google-analytics-to-piwik-google2piwik/#toc-google2piwik-requirements

Does anyone have a Gmail account that they could use for this purpose?

I'll need valuse for these fields in the import script:

user_login = user
user_pass  = password

comment:18 Changed 5 years ago by jim

Ed has the GA account. FYI Piwik now on DEV. Settings here: https://dev.transitionnetwork.org.webarch.net/admin/settings/piwik

comment:19 Changed 5 years ago by chris

Piwik now on DEV. Settings here: https://dev.transitionnetwork.org.webarch.net/admin/settings/piwik

Cool, when I access that page I get:

Access denied
You are not authorized to access this page. 

Any idea what causes this?

comment:20 Changed 5 years ago by jim

Sorry Piwik had Drupal permissions to set, now done so you should be ok now.

comment:21 Changed 5 years ago by chris

I still have the same Access denied message...

comment:22 Changed 5 years ago by ed

My personal acct: "edmittance" is currently associated to the GA for the site. We have a shared TT one which we could use: "transitiontownsnetwork"

what's best?

comment:23 Changed 5 years ago by jim

@Chris: Perms now set BUT something odd is happening - I've been saving the permissions page but it just doesn't update. However the roll-specific page does save, and that has a lot less input fields in it. Are we hitting some server-side security? Like the max post size shenzhen you disabled on LIVE?

You should get in now anyway.

comment:24 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 1.7 to 1.9

My personal acct: "edmittance" is currently associated to the GA for the site. We have a shared TT one which we could use: "transitiontownsnetwork"

what's best?

I guess it would be best to use the "transitiontownsnetwork" one if that can easilly be associated with the GA for the site, is this easy to do?

If not I'm OK about using the "edmittance" one if you are -- I won't read your email! Also you might want to change the passwd before and afterwards if we do use this one -- I'll want to be locked out of it when we are done!

Regarding Piwik module settings, I'd suggest we select let users opt out here:

Custom tracking settings:

  • Users cannot control whether they are tracked or not.
  • Track users by default, but let individual users to opt out.
  • Do not track users by default, but let individual users to opt in.

https://dev.transitionnetwork.org.webarch.net/admin/settings/piwik#edit-piwik-custom-0

I have switched the dev site to this for testing, if we agree to use this setting on the live server it'll be something to mention in the privacy settings update.

comment:25 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 1.9 to 1.95

There is a Piwik add on that enables internal site searches to be tracked: https://github.com/BeezyT/piwik-sitesearch/wiki

I guess this might be of use to see what people are searching for on the site, shall I look at installing it?

comment:26 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.45
  • Total Hours changed from 1.95 to 2.4

Regarding the opt out of tracking option, I don't get an option to do this and I guess it's related to this:

Only users with opt-in or out of tracking permission are allowed to set their own preference.

So I guess we need to work out where that is set at some point.

The Piwik site now has some stats from the dev site (the ones from overseas was me testing with tor):

https://piwik.transitionnetwork.org/index.php?module=CoreHome&action=index&idSite=2&period=day&date=today#module=Dashboard&action=embeddedIndex&idSite=2&period=day&date=yesterday

I'm going to work on ticket:166#comment:8 for a while now.

I haven't forgotten I need to also look at this later:

something odd is happening - I've been saving the permissions page but it just doesn't update. However the roll-specific page does save, and that has a lot less input fields in it. Are we hitting some server-side security? Like the max post size shenzhen you disabled on LIVE?

comment:28 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 2.4 to 2.5

Urm this is a public trac install, can you change the password on the account and email me a new one? We should also have a think where this password is used and also change it in those places and I'll look at how to edit it out of the Mysql tables for Trac...

comment:29 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 2.5 to 2.7

Info here on editing Trac comments at a SQL level:

http://trac.edgewall.org/ticket/454#comment:70

aptitude install sqlite3 
cd /web/tech.transitionnetwork.org/trac/db
sqlite3 trac.db "SELECT * FROM ticket_change WHERE ticket=160" | less
sqlite3 trac.db "DELETE FROM ticket_change WHERE ticket=160 and time=1308747812"

And the comment has gone :-)

comment:30 Changed 5 years ago by ed

oops who's the cnut in a hat? me? sorry.

p/word changed.

also remembered that agreed to do a bunch of p/word changing around the various things. perhaps we could discuss this at conference.

comment:31 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 2.7 to 2.9

For the Google Analytics data import we need Python 2.6 and we are running with Python 2.5.2, installing a copy of 2.6 from source in /usr/local seems like the best bet:

I'll give this a go tomorrow.

comment:32 Changed 5 years ago by ed

is there anything ed should be doing about this ticket? like checking it out and going 'ooh' or anything useful like that?

comment:33 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 2.9 to 3.1

is there anything ed should be doing about this ticket? like checking it out and going 'ooh' or anything useful like that?

Well, you can have a play with the web interface: https://piwik.transitionnetwork.org/ and see if there are any additional plugins that might be needed, http://piwik.org/faq/plugins/

Also you could look at how to set up accounts for other users and if any of the user interface stuff needs documenting here, wiki:PiwikServer and if you need help with anything and also have a think about who might be given accounts and what process there should be for this (if any).

Perhaps also consider how and what info to make available to the public from it -- as long as IP addresses and user agent strings are not made public, it's essential to anonomise these, I think a lot of the data could be made available to all?

At some point in the future we could consider offering it as an alternative to GA for local transition projects, the advantage for us if lots took this up would be that we would have lots more data to ponder over, the advantage for the local groups would be that google wouldn't have this data, the disadvantage for us would be that we would have to help people set it up and also it would add additional load to the server, though at some point it might deserve a dedicated server. This is really a political issue -- do we want and need to have lots of stats about transition sites? Do we want to start offering alternatives to corporate services for local groups?

We also need to decide which server to run this site on, some pro and con thoughts...

  • dev - won't add any load to live server, but when there is a problem with dev or simply apache is being restarted to test something the site will be down -- potentially less reliable, also it'll change the nature of the dev machine as keeping apache running on it will be vital for the live site
  • live - will increase the load on the server, would mean that the live server wasn't dependant on the dev server

I started off think it was best on the dev server for load reasons but now I'm thinking the live one makes more sense for reliability reasons -- thoughts?

At the moment it have set it so that the stats code is accessed via https only and I think this is OK as it means we don't need to configure Varnish not to cache it and there wouldn't be any performance gain in having Varnish transparently reverse proxy it, it also means that we don't have to worry about configuring Piwik to work on port 80 and port 443 and still be secure.

Ed -- does that give you enough to think about ;-)

My Piwki TODO:

comment:34 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.5
  • Total Hours changed from 3.1 to 4.6

My Piwki TODO:

  • Set it up to write static reports

Done: wiki:PiwikServer#Cron

  • Sort GA import

Done and documented, it wasn't so simple and if I had remembered that Trac uses Python I'd have been more hesitant about installing a Python from source than I was... wiki:PiwikServer#Python

The import for just over one month is currently running, I think we need to decide about which server to run Piwik on before I import more data. I suggest we:

Would that be OK?

comment:35 Changed 5 years ago by ed

had a look - looks fun - hard to tell, tbh, as there are patches of data and it's not immediately clear what is and isn't working.

  1. doesn't look like we need more plugins right now. I reckon this will take a bit of getting used to and there will be some tweaking in phase 5 (sept/oct) anyway
  2. new accts - see above - I'll work it out over the summer and then see who is interested
  3. offering to TIs - waaaaaay off into one of many futures :)
  4. on which server - your call - sounds to me like LIVE is your choice - happy for whatever you think is best
  5. stats.transitionnetwork sounds cool
  6. Q about varnish - pls confirm that this will capture all requests, including those that are cached
  7. we only *need* to measure LIVE for now - how about we focus on that and see if there is value in extending this to DEV or NEWS or others?
  8. we will also need to measure #271
  9. we will need to run this in parallel with GA for a while until happy it's working

comment:36 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 4.6 to 4.8

GA imported stats from one month: https://piwik.transitionnetwork.org/index.php?module=CoreHome&action=index&date=last30&period=range&idSite=1#module=Dashboard&action=embeddedIndex&date=last30&period=range&idSite=1

  1. on which server - your call - sounds to me like LIVE is your choice - happy for whatever you think is best

OK, I'll set it up on the live server.

  1. Q about varnish - pls confirm that this will capture all requests, including those that are cached

Yes, if the tracking code is accessed via HTTPS then it won't touch Varnish.

  1. we only *need* to measure LIVE for now - how about we focus on that and see if there is value in extending this to DEV or NEWS or others?

OK.

  1. we will also need to measure #271

Measure what the SR's are doing? If so I think that can be done if they have a role and we enable tracking here: https://dev.transitionnetwork.org/admin/settings/piwik#edit-piwik-visibility-roles-0-wrapper

Or did you mean measure the traffic their blog posts get?

  1. we will need to run this in parallel with GA for a while until happy it's working

OK, I think that should be OK, unless Jim thinks the modules will conflict?

That also means we need a parallel privacy policy?

comment:37 Changed 5 years ago by jim

Don't see a problem running Piwik and GA at the same time, apart from slower loads/more net access for users.

comment:38 Changed 5 years ago by ed

  1. just for a bit... a gentle overlap... 1,2 weeks... if nothing else to see if the data matches

comment:39 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 4.8 to 5.8

Live install has been done and documented: wiki:PiwikServer#LiveServer

Apart from the GA import, this is going to take some time due to the Python install and the time it takes to run due to google limits but I'll see if I can get it done tonight:

For websites with low to average traffic volumes, it has the capacity of processing about 2,000 days's worth of GA data in 24 hours.

Jim -- I think we can now enable the Piwik module on the live server, the URL to use for http and https is https://stats.transitionnetwork.org/

comment:40 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.2
  • Total Hours changed from 5.8 to 7.0

OK, I did a source install of Python 2.6 in the end as doing it with unstable debs was going to result in a mess, notes here: wiki:PiwikServer#GAImport

The GA import is running, I didn't know what date to start it from so I played safe and picked 2008, and it started pulling in from 2010-02 onwards, it'll now take a while to run...

But I think it means that this ticket is almost ready to be closed :-)

comment:41 Changed 5 years ago by jim

Piwik module enabled and perms set for devs.

However, something odd is happening. When I save the settings for Piwik at https://www.transitionnetwork.org/admin/settings/piwik I get "The configuration options have been saved" but they haven't the fields are empty. The same thing happened with GMaps Location not saving the settings earlier today.

I think we have a problem, and one responsible for other errors we've been seeing. More on #286...

comment:42 Changed 5 years ago by jim

Piwik enabled on LIVE since #286 got some love... Site ID 1, yes?

comment:43 Changed 5 years ago by ed

1 seems like a good number to me. The NUMBER ONE website, ever.

comment:44 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 7.0 to 7.2

The Piwik web bug seems to be working, we have some stats here https://stats.transitionnetwork.org/

The google import didn't pull in any data, where as it did on the dev server, so I'm running it again.

comment:45 Changed 5 years ago by chris

Run from 2010-02 failed:

Exporting 2010-03-07
VISIT: Initialize
Traceback (most recent call last):
  File "./google2piwik.py", line 636, in <module>
    export_period(start_date, end_date)
  File "./google2piwik.py", line 70, in export_period
    export_day(str(currentdate), fetcher)
  File "./google2piwik.py", line 113, in export_day
    simulator.initialize(fetcher, "ga:latitude,ga:longitude,ga:hour,ga:flashVersion,ga:javaEnabled,ga:language,ga:screenResolution", "ga:visits")
  File "./google2piwik.py", line 372, in initialize
    self.visits[index].first(visit)
IndexError: list index out of range

I think we might need to run a month at a time.

comment:46 Changed 5 years ago by chris

Started another run with these dates:

[export]
start = 2010-03-07
end   = 2010-04-07

But it failed with the same error message, changed the dates to:

[export]
start = 2010-03-08
end   = 2010-04-08

And it seems to be running OK again.

comment:47 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 7.2 to 7.3

I have created accounts for jim and laura, best thing when giving more people access to the stats is to create new accounts rather then sharing the admin account details (don't even think about it ;-)

comment:48 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 7.3 to 7.4

Ed, are we set up on feedburner? If so we can get some stats from there also:

In the Piwik dashboard, you can add the 'Feedburner' widget. Feedburner is a
service that publishes your RSS/Atom feeds, and tracks statistics about how
many users are registered to the feed, how many view each article, and how
many clicks there are on each entry. You must register on feedburner.com and
publish your feed (there is a step-by-step instruction). Once you have
published your feed, click on "Publicize" > "Awareness API" - activate this
option, and then put your feed's name in the Piwik Feedburner widget. You
should now see your feed usage statistics in the Piwik dashboard!

http://piwik.org/faq/how-to/#faq_99

comment:49 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 7.4 to 7.5

The last date imported before the server hung was 2010-04-20 so this is where to start from when the load issue has been sorted: ticket:287

comment:50 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 7.5 to 7.6

It just did 5 more days before falling over again:

Exporting 2010-04-26
VISIT: Initialize
VISIT: Fetch 1                                                             99 perc. finished. estimated      1 seconds left.
VISIT: Fetch 2                                                             99 perc. finished. estimated      1 seconds left.
VISIT: Fetch 3                                                             99 perc. finished. estimated      1 seconds left.
VISIT: Fetch landing, exits                                                99 perc. finished. estimated      1 seconds left.
ACTION: Export paths
Traceback (most recent call last):
  File "./google2piwik.py", line 636, in <module>
    export_period(start_date, end_date)
  File "./google2piwik.py", line 70, in export_period
    export_day(str(currentdate), fetcher)
  File "./google2piwik.py", line 128, in export_day
    action_manager.export(config.SITE_BASE_URL)
  File "/web/stats.transitionnetwork.org/google2piwik-1.2.5/action.py", line 77, in export
    self.actions[action].export(base_path)
  File "/web/stats.transitionnetwork.org/google2piwik-1.2.5/action.py", line 38, in export
    self.id_action_url = sql.insert_log_action(type_url)
  File "/web/stats.transitionnetwork.org/google2piwik-1.2.5/sql.py", line 73, in insert_log_action
    cursor.execute(INSERT_LOG_ACTION, values)
  File "build/bdist.linux-x86_64/egg/MySQLdb/cursors.py", line 174, in execute
  File "build/bdist.linux-x86_64/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')

Going to follow up on ticket:287

comment:51 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 7.6 to 7.9

Reading through the comments here:

http://serverfault.com/questions/280916/unable-to-remove-limit-on-memory-usage-for-php-script

Makes me wonder if it wasn't Piwik which brought down the server today,

After upping the memory limits in my.cnf, see ticket:287, it did a 2 month import before failing with:

Exporting 2010-06-10
VISIT: Initialize
Traceback (most recent call last):
  File "./google2piwik.py", line 636, in <module>
    export_period(start_date, end_date)
  File "./google2piwik.py", line 70, in export_period
    export_day(str(currentdate), fetcher)
  File "./google2piwik.py", line 113, in export_day
    simulator.initialize(fetcher, "ga:latitude,ga:longitude,ga:hour,ga:flashVersion,ga:javaEnabled,ga:language,ga:screenResolution", "ga:visits")
  File "./google2piwik.py", line 372, in initialize
    self.visits[index].first(visit)
IndexEExporting 2010-06-10
VISIT: Initialize
Traceback (most recent call last):
  File "./google2piwik.py", line 636, in <module>
    export_period(start_date, end_date)
  File "./google2piwik.py", line 70, in export_period
    export_day(str(currentdate), fetcher)
  File "./google2piwik.py", line 113, in export_day
    simulator.initialize(fetcher, "ga:latitude,ga:longitude,ga:hour,ga:flashVersion,ga:javaEnabled,ga:language,ga:screenResolution", "ga:visits")
  File "./google2piwik.py", line 372, in initialize
    self.visits[index].first(visit)
IndexError: list index out of rangerror: list index out of range

This error repeated when I tried it starting from 2010-06-10, I tried to set it off again from 2010-06-11, but the same error:

Exporting 2010-06-11
VISIT: Initialize
Traceback (most recent call last):
  File "./google2piwik.py", line 636, in <module>
    export_period(start_date, end_date)
  File "./google2piwik.py", line 70, in export_period
    export_day(str(currentdate), fetcher)
  File "./google2piwik.py", line 113, in export_day
    simulator.initialize(fetcher, "ga:latitude,ga:longitude,ga:hour,ga:flashVersion,ga:javaEnabled,ga:language,ga:screenResolution", "ga:visits")
  File "./google2piwik.py", line 372, in initialize
    self.visits[index].first(visit)
IndexError: list index out of range

One for another day...

comment:52 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 7.9 to 8.0

Was getting a lot of these in the logwatch email:

Jul  2 14:15:57 quince suhosin[16508]: ALERT - configured GET variable value length limit exceeded - dropped variable 'urlref' (attacker 'X.X.X.X', file '/web/stats.transitionnetwork.org/piwik/piwik.php')

So I have changed these settings in /etc/php5/apache2/conf.d/suhosin.ini:

;suhosin.get.max_value_length = 512
suhosin.get.max_value_length = 2048
;suhosin.get.max_vars = 100
suhosin.get.max_vars = 500

comment:53 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 8.0 to 8.3

For some reason I don't understand the secure flag isn't being set on Piwik cookies, I have done some searching around about this and it seems it should be so I have raised a ticket here about it:

comment:54 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.02
  • Total Hours changed from 8.3 to 8.32

I have set off the GA import again starting from 2010-06-12

comment:55 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 8.32 to 8.42

Regarding the memory limit errors I have also upped it in /etc/php5/cli/php.ini and perhaps this will solve that.

The GA import failed again on 2010-06-16 so I have set it off again from 2010-06-17.

How important is the import of old data?

comment:56 Changed 5 years ago by ed

we won't die without the old data on piwik, but I need to have it somewhere - so if it can't come into piwik, it'll have to stay on google (although we can unplug the google code)...

how far back can you import?

why is it breaking all the time?

maybe it's better to leave the archive on google?

comment:57 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.08
  • Total Hours changed from 8.42 to 8.5
  • Summary changed from install and configure awstats to install and configure piwik (not awstats)

why is it breaking all the time?

Dunno, but the last run is still going and it's up to 2011-02-24 so I think it's going to be OK now :-)

I have just realised that I forgot to set the Piwik site to UTC rather than BST so some half of the imported data (hits duing months when BST applied) is going to be one hour out, but I don't think we really need to worry about this.

BTW loads of hits via this article today (163!): http://earlyretirementextreme.com/the-ethics-of-ere-and-of-dropping-out-of-the-system.html

comment:58 Changed 5 years ago by jim

Erm, any chance you can do the import over night - the server is pretty much crippled.

comment:59 Changed 5 years ago by ed

site's completely unusable. is it only this that is happening?

comment:60 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 8.5 to 8.7

Yikes, sorry I was watching it via top but then had to go get the kids from school...

It looks like perhaps the google import code has a memory leak and perhaps also combined with a small spike in apache requests the server ran out of memory and stopped responding -- apache memory usage spiked at 3Gb when the machine has 2Gb -- I have reduced the max number of apache processes (MaxClients) to 30 from 40.

https://kiwi.transitionnetwork.org/munin/webarch.net/quince.webarch.net.html

I'll try work out which day the GA import got up to and run it late one night.

comment:61 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 8.7 to 9.0

Hmm, I seem to have caused apache to stop responding just now by just trying to browse Piwik data to see which date the import got up to. I think ideally we would have piwik on a dedicated virtual server -- all the mysql processing and fancy ajax stuff seems to come at a cost!

BTW each apache process uses between 340 and 390 Mb of RAM (number from top, somewhat deceptive since some memory will be shared), also there have been times recently when there have been more than 30 apache processes running, see:

https://kiwi.transitionnetwork.org/munin/webarch.net/quince.webarch.net-apache_processes.html

In terms of the Piwik import it looks like we only have data to sometime in Jun 2010 imported, which is odd since it had got up to Feb 2011 before I went out.

comment:62 Changed 5 years ago by ed

hmm.

If it's going to take up too much space, or require extra hardware, we'll have to review. Perhaps for starters we don't import *anything* and see how it works on LIVE over the next couple of months?

comment:63 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 9.0 to 9.2

If it's going to take up too much space, or require extra hardware, we'll have to review.

It's database is going to get big, I think this is unavoidable, but we don't have a shortage of space on the server at the moment, I think the issue is going to be the load when people are looking at the stats, however as not many people will do this and I expect they won't do it that often it should be OK.

This is what we have at the moment database size, I'm not sure to what extent the 381Mb of stats data is from the GA import or from the last week -- we should check it next week and see how much it's grown to get an idea of how big it's going to get and how quickly.

mysql> SELECT table_schema "Data Base Name", sum( data_length + index_length ) / 1024 / 1024 
       "Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema ; FROM 
       information_schema.TABLES GROUP BY table_schema ; 
+--------------------+----------------------+
| Data Base Name     | Data Base Size in MB |
+--------------------+----------------------+
| information_schema |           0.00390625 | 
| live               |         712.58561993 | 
| live_sharingengine |         504.50749397 | 
| live_workspaces    |          23.13003349 | 
| mysql              |           0.50284767 | 
| stats              |         381.30146790 | 
| transwiki          |           1.30584908 | 
+--------------------+----------------------+

Perhaps for starters we don't import *anything* and see how it works on LIVE over the next couple of months?

I'm happy with that.

comment:64 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 9.2 to 9.5

The nightly mysql backup failed for the dev server due to a corrupt mysql table:

Warning: mysqldump: Got error: 144: Table './piwik/archive_blob_2011_06' is marked as crashed and last (automatic?) repair failed when using LOCK TABLES

Warning: Failed to dump mysql databases piwik

It was fixed like this:

mysql> CHECK TABLE archive_blob_2011_06;
+----------------------------+-------+----------+-----------------------------------------------------------------------------------+
| Table                      | Op    | Msg_type | Msg_text                                                                          |
+----------------------------+-------+----------+-----------------------------------------------------------------------------------+
| piwik.archive_blob_2011_06 | check | warning  | Table is marked as crashed and last repair failed                                 | 
| piwik.archive_blob_2011_06 | check | error    | Can't read indexpage from filepos: -1                                             | 
| piwik.archive_blob_2011_06 | check | Error    | Incorrect key file for table './piwik/archive_blob_2011_06.MYI'; try to repair it | 
| piwik.archive_blob_2011_06 | check | error    | Corrupt                                                                           | 
+----------------------------+-------+----------+-----------------------------------------------------------------------------------+
4 rows in set (0.37 sec)

mysql> REPAIR TABLE archive_blob_2011_06;
+----------------------------+--------+----------+----------+
| Table                      | Op     | Msg_type | Msg_text |
+----------------------------+--------+----------+----------+
| piwik.archive_blob_2011_06 | repair | status   | OK       | 
+----------------------------+--------+----------+----------+
1 row in set (4.61 sec)

mysql> CHECK TABLE archive_blob_2011_06;
+----------------------------+-------+----------+----------+
| Table                      | Op    | Msg_type | Msg_text |
+----------------------------+-------+----------+----------+
| piwik.archive_blob_2011_06 | check | status   | OK       | 
+----------------------------+-------+----------+----------+
1 row in set (0.34 sec)

This is something to watch out for on the live server -- when it happens the web interface is only going to display "No data", there is a feature request to address this:

We should detect such error in the UI, and display a warning message in red / yellow, telling the user that statistics are not tracked because tables are crashed.
http://dev.piwik.org/trac/ticket/2194

Once it's repaired the backup can then be run manually using ninjahelper.

Regarding ticket:160#comment:53 -- not having secure cookies is the expected behaviour: http://dev.piwik.org/trac/ticket/2543 however I must admit that I still don't fully understand why.

comment:65 Changed 5 years ago by ed

  1. so is piwik interfering with all the back ups? or just its own? it's beginning to sound like a bit of a lamer, if you don't mind me saying... manual backups, 'no data' showing in UI etc....
  1. given the import process is not working, let's either (a) not import and keep the GA as archive, (b) Jim suggested he could import it on his local machine and upload from there

comment:66 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 9.5 to 9.55
  1. so is piwik interfering with all the back ups? or just its own?

When a table is corrupted it prevents all mysql backups taking place. I do get an email when this happens. I'm not sure what caused it and how often it might happen, I suggest we suck it and see on this...

  1. given the import process is not working, let's either (a) not import and keep the GA as archive, (b) Jim suggested he could import it on his local machine and upload from there

Well, the import process did work, it's just that it overloaded the machine. I could look at running it with nice and doing it after the conference?

In terms of importing it on another machine, I did think about this but then we would have two databases to merge -- this might be simple, or not, I haven't looked into it...

I think we don't appreciate the complexity of Piwik or the load implications when we decided to use it... But it might still be worth it -- how are the stats looking here?

comment:67 Changed 5 years ago by ed

Stats are looking OK, but I'm not sure it's all there. Try choosing a date range - doesn't work. Perhaps this is my not knowing how to use it, but it's not filling me with trust yet.

As agreed - we won't import any data unless it's possible to do it on downtime, and not break the server.

comment:68 Changed 5 years ago by ed

  • Status changed from accepted to closed
  • Resolution set to fixed

as discussed in skype between ed chris jim, we're going to run piwik without any further data imports across july/august and review in phase 5 in sept...closing this one for new phase 5 ticket

comment:69 Changed 5 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 9.55 to 9.65

Piwik upgraded to 1.5.1

Note: See TracTickets for help on using tickets.