wiki:WebServerLogs
Last modified 15 months ago Last modified on 09/01/15 18:41:49

Table of Contents

  1. webalizer
  2. logstalgia
  3. goaccess

Web Server Log File Analysis

The PiwikServer stats are very good at tracking people who consent to be tracked, however they don't track bots, and abusive people who don't want to be tracked. The difference in the traffic is quite stark, for example on 26th June 2013 the javascript based Piwik stats reported:

  • 1106 visits
  • 3314 page views

Where as the raw log files showed ten times as many page views:

  • 7155 visits
  • 32688 page views

Therefore to get a handle on what the web servers are doing, as opposed to what really people are doing on the site, we need some tools other than Piwik, this page was created on ticket:555.

webalizer

Note these logs are no longer generated

These logs are password protected and available at https://penguin.transitionnetwork.org/webalizer/puffin/ and they are good for getting an overview on bandwidth, hits and visitors, eg:

Puffin Webalizer stats 2013-07-12

PenguinServer is set up to get a copy of the Nginx awstats.log file very day via logrotate, see AwStatsInstall#Copyingthelogstopenguin and on Penguin there is a puffin user account which has this crontab:

05 07 * * * /usr/local/bin/puffin-webalizer

Which runs /usr/local/bin/puffin-webalizer which contains:

#!/bin/bash

DATE=$(date "+%Y-%m-%d")
LOG_FILE=/home/puffin/nginx/puffin-nginx-$DATE.log
STATS_DIR=/web/penguin.transitionnetwork.org/www/webalizer/puffin

cd $STATS_DIR
webalizer -p -n transitionnetwork.org -o $STATS_DIR $LOG_FILE

logstalgia

This allows a realtime display of log files, install logstalgia on your local machine, for example:

sudo aptitude install logstalgia

And then pipe the logs into it via ssh, for example these are the commands to see a real time display from the 3 servers:

ssh puffin.webarch.net sudo tail -f /var/log/nginx/access.log | logstalgia --sync
ssh parrot.webarch.net sudo tail -f /home/*/logs/access.log | logstalgia --sync
ssh penguin.webarch.net sudo tail -f /var/log/nginx/*.access.log | logstalgia --sync

The following screen shot doesn't do it justice, the last two numbers of the IP address have been removed from this image:

Logstalgia display of Nginx access logs on Puffin

For more info see https://code.google.com/p/logstalgia/ and the videos here https://www.youtube.com/user/Logstalgia

goaccess

To get an overview of a log file you can use goaccess of the server to load a specific log file, for example on puffin, this is the current log:

goaccess -f /var/log/nginx/access.log

And this is yesterdays:

goaccess -f /var/log/nginx/access.log.1

This displays totals like this:

Goaccess display of Nginx log file on Puffin

When we upgrade to Wheezy, see ticket:535 we should set up Goaccess to generate a HTML / email report per day.

For more information see the goaccess web site at http://goaccess.prosoftcorp.com/

Attachments