wiki:AwStatsInstall

Version 10 (modified by chris, 3 years ago) (diff)

--

AWStats

The install of AWStats was abandoned as all attempts to try to get some stats generated from the Nginx logs failed, see ticket:555#comment:70, the plan is to use wiki:PiwikServer.

Copying the logs to penguin

The nginx logs on wiki:PuffinServer are copied to wiki:PenguinServer using scp, in /etc/logrotate.d/nginx the nginx-logs script is run just before the logs are rotated:

        prerotate
                /usr/local/bin/nginx-logs
                /usr/local/bin/50x-errors chris@webarchitects.co.uk
                if [ -d /etc/logrotate.d/httpd-prerotate ]; then \
                        run-parts /etc/logrotate.d/httpd-prerotate; \
                fi \
        endscript

This script contains:

#!/bin/bash

DATE=$(date "+%Y-%m-%d")
LOG_FILE=/var/log/nginx/access_combined.log
REMOTE_FILE=puffin-nginx-$DATE.log

scp $LOG_FILE penguin:nginx/$REMOTE_FILE

It depends on the /root/.ssh/config file on puffin containing:

Host penguin
  Hostname penguin.webarch.net
  User puffin

And the puffin root users public key being on penguin.

On penguin the /etc/ssh/sshd_config file contains:

AllowGroups sudo sshaccess

And the puffin user on penguin was created with no password and in the sshaccess group:

adduser --disabled-password --ingroup sshaccess puffin

Then the ssh public key was copied to /home/puffin/.ssh/authorized_keys on penguin and this was added at the start so it can only be used from puffin's IP address:

from="puffin.webarch.net" ssh-rsa AAAA...

The log files that are created on penguin are in /home/puffin/nginx/ and have a file name based on the date they are created, eg:

-rw-r----- 1 puffin puffin  11M Jun 22 06:25 puffin-nginx-2013-06-22.log

Processing the logs

We don't want to keep IP addresses in the log files -- we want to remove them so the stats can be made public, and the perl script http://wiki.opennicproject.org/Tier2ConfigObfuscatingLogs looks like it will do the job:

#! /usr/bin/perl
#
# blurAddys.pl - Obfuscate IP addresses in a file
#
# cat some.log | blurAddys.pl > some_blurred.log
#
#####################################################################
use strict;

while(<STDIN>)
{
	s/\d{1,3}(\.|-)\d{1,3}(\.|-)\d{1,3}(\.|-)\d{1,3}/XX$1XX$2XX$3XX/g;
	#s/([0-9A-Fa-f]{4}:[0-9A-Fa-f:]+:[0-9A-Fa-f]{1,4})([^:0-9A-Fa-f])/XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX$2/g;
	print $_;
}

The regular expression for ipv6 addresses was matching the date so that has been commented out.

So the above perl script was saved to /usr/local/bin/blurAddys.pl

Awstats install

Deb installed:

aptitude search awstats

The following NEW packages will be installed:
  awstats libnet-xwhois-perl{a}

The awstats config files are in /etc/awstats, the example file was copied:

cp /etc/awstats/awstats.conf /etc/awstats/awstats.www.transitionnetwork.org.conf

This variable was changed, the logs should be anonomised as they are read:

LogFile="cat /home/puffin/nginx/puffin-nginx-%YYYY-0%MM-0%DD-0.log | /usr/local/bin/blurAddys.pl |"

The log format is defined in /var/aegir/config/server_master/nginx.conf on wiki:PuffinServer as follows:

 ## Log Format
  log_format        main '"$proxy_add_x_forwarded_for" $host [$time_local] '
                         '"$request" $status $body_bytes_sent '
                         '$request_length $bytes_sent "$http_referer" '
                         '"$http_user_agent" $request_time "$gzip_ratio"';

  client_body_temp_path  /var/lib/nginx/body 1 2;
  access_log             /var/log/nginx/access.log main buffer=32k;
  error_log              /var/log/nginx/error.log crit;

This is an example line from the nginx logs:

"95.211.87.85" www.transitionnetwork.org [01/Jul/2013:13:02:57 +0100] "GET / HTTP/1.0" 200 47900 118 48601 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 0.059 "-"

So this is how that is defined for awstats:

LogFormat="%host %other %time1 %methodurl %code %bytesd %other %other %refererquot %uaquot %extra1 %gzipratio"

Other values in /etc/awstats/awstats.www.transitionnetwork.org.conf which were changed:

SiteDomain="transitionnetwork.org"

DNSLookup=0