[[PageOutline(2-5, Table of Contents, floated)]] = Web Server Log File Analysis = The PiwikServer stats are very good at tracking people who consent to be tracked, however they don't track bots, and abusive people who don't want to be tracked. The difference in the traffic is quite stark, for example on 26th June 2013 the javascript based Piwik stats reported: * 1106 visits * 3314 page views Where as the raw log files showed ten times as many page views: * 7155 visits * 32688 page views Therefore to get a handle on what the web servers are doing, as opposed to what really people are doing on the site, we need some tools other than Piwik, this page was created on ticket:555. == webalizer == '''Note these logs are no longer generated''' These logs are password protected and available at https://penguin.transitionnetwork.org/webalizer/puffin/ and they are good for getting an overview on bandwidth, hits and visitors, eg: [[Image(puffin_webalizer_daily_usage_201307.png)]] PenguinServer is set up to get a copy of the Nginx awstats.log file very day via logrotate, see AwStatsInstall#Copyingthelogstopenguin and on Penguin there is a puffin user account which has this crontab: {{{ 05 07 * * * /usr/local/bin/puffin-webalizer }}} Which runs {{{/usr/local/bin/puffin-webalizer}}} which contains: {{{ #!/bin/bash DATE=$(date "+%Y-%m-%d") LOG_FILE=/home/puffin/nginx/puffin-nginx-$DATE.log STATS_DIR=/web/penguin.transitionnetwork.org/www/webalizer/puffin cd $STATS_DIR webalizer -p -n transitionnetwork.org -o $STATS_DIR $LOG_FILE }}} == logstalgia == This allows a realtime display of log files, install logstalgia on your local machine, for example: {{{ sudo aptitude install logstalgia }}} And then pipe the logs into it via ssh, for example these are the commands to see a real time display from the 3 servers: {{{ ssh puffin.webarch.net sudo tail -f /var/log/nginx/access.log | logstalgia --sync ssh parrot.webarch.net sudo tail -f /home/*/logs/access.log | logstalgia --sync ssh penguin.webarch.net sudo tail -f /var/log/nginx/*.access.log | logstalgia --sync }}} The following screen shot doesn't do it justice, the last two numbers of the IP address have been removed from this image: [[Image(logstalgia-puffin.png)]] For more info see https://code.google.com/p/logstalgia/ and the videos here https://www.youtube.com/user/Logstalgia == goaccess == To get an overview of a log file you can use {{{goaccess}}} of the server to load a specific log file, for example on puffin, this is the current log: {{{ goaccess -f /var/log/nginx/access.log }}} And this is yesterdays: {{{ goaccess -f /var/log/nginx/access.log.1 }}} This displays totals like this: [[Image(goaccess-puffin.png)]] When we upgrade to Wheezy, see ticket:535 we should set up Goaccess to generate a HTML / email report per day. For more information see the goaccess web site at http://goaccess.prosoftcorp.com/