Version 11 (modified by chris, 3 years ago) (diff) |
---|
Table of Contents
AWStats
The install of AWStats was abandoned as all attempts to try to get some stats generated from the Nginx logs failed, see ticket:555#comment:70, the plan is to use other tools, see wiki:WebServerLogs and wiki:PiwikServer.
Copying the logs to penguin
The nginx logs on wiki:PuffinServer are copied to wiki:PenguinServer using scp, in /etc/logrotate.d/nginx the nginx-logs script is run just before the logs are rotated:
prerotate /usr/local/bin/nginx-logs /usr/local/bin/50x-errors chris@webarchitects.co.uk if [ -d /etc/logrotate.d/httpd-prerotate ]; then \ run-parts /etc/logrotate.d/httpd-prerotate; \ fi \ endscript
This script contains:
#!/bin/bash DATE=$(date "+%Y-%m-%d") LOG_FILE=/var/log/nginx/access_combined.log REMOTE_FILE=puffin-nginx-$DATE.log scp $LOG_FILE penguin:nginx/$REMOTE_FILE
It depends on the /root/.ssh/config file on puffin containing:
Host penguin Hostname penguin.webarch.net User puffin
And the puffin root users public key being on penguin.
On penguin the /etc/ssh/sshd_config file contains:
AllowGroups sudo sshaccess
And the puffin user on penguin was created with no password and in the sshaccess group:
adduser --disabled-password --ingroup sshaccess puffin
Then the ssh public key was copied to /home/puffin/.ssh/authorized_keys on penguin and this was added at the start so it can only be used from puffin's IP address:
from="puffin.webarch.net" ssh-rsa AAAA...
The log files that are created on penguin are in /home/puffin/nginx/ and have a file name based on the date they are created, eg:
-rw-r----- 1 puffin puffin 11M Jun 22 06:25 puffin-nginx-2013-06-22.log
Processing the logs
We don't want to keep IP addresses in the log files -- we want to remove them so the stats can be made public, and the perl script http://wiki.opennicproject.org/Tier2ConfigObfuscatingLogs looks like it will do the job:
#! /usr/bin/perl # # blurAddys.pl - Obfuscate IP addresses in a file # # cat some.log | blurAddys.pl > some_blurred.log # ##################################################################### use strict; while(<STDIN>) { s/\d{1,3}(\.|-)\d{1,3}(\.|-)\d{1,3}(\.|-)\d{1,3}/XX$1XX$2XX$3XX/g; #s/([0-9A-Fa-f]{4}:[0-9A-Fa-f:]+:[0-9A-Fa-f]{1,4})([^:0-9A-Fa-f])/XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX$2/g; print $_; }
The regular expression for ipv6 addresses was matching the date so that has been commented out.
So the above perl script was saved to /usr/local/bin/blurAddys.pl
Awstats install
Deb installed:
aptitude search awstats The following NEW packages will be installed: awstats libnet-xwhois-perl{a}
The awstats config files are in /etc/awstats, the example file was copied:
cp /etc/awstats/awstats.conf /etc/awstats/awstats.www.transitionnetwork.org.conf
This variable was changed, the logs should be anonomised as they are read:
LogFile="cat /home/puffin/nginx/puffin-nginx-%YYYY-0%MM-0%DD-0.log | /usr/local/bin/blurAddys.pl |"
The log format is defined in /var/aegir/config/server_master/nginx.conf on wiki:PuffinServer as follows:
## Log Format log_format main '"$proxy_add_x_forwarded_for" $host [$time_local] ' '"$request" $status $body_bytes_sent ' '$request_length $bytes_sent "$http_referer" ' '"$http_user_agent" $request_time "$gzip_ratio"'; client_body_temp_path /var/lib/nginx/body 1 2; access_log /var/log/nginx/access.log main buffer=32k; error_log /var/log/nginx/error.log crit;
This is an example line from the nginx logs:
"95.211.87.85" www.transitionnetwork.org [01/Jul/2013:13:02:57 +0100] "GET / HTTP/1.0" 200 47900 118 48601 "-" "Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)" 0.059 "-"
So this is how that is defined for awstats:
LogFormat="%host %other %time1 %methodurl %code %bytesd %other %other %refererquot %uaquot %extra1 %gzipratio"
Other values in /etc/awstats/awstats.www.transitionnetwork.org.conf which were changed:
SiteDomain="transitionnetwork.org" DNSLookup=0