Version 69 (modified by chris, 3 years ago) (diff) |
---|
Table of Contents
Puffin
puffin.webarch.net is a 8GB RAM, 14 CPU core Debian Squeeze virtual server which replaced NewLiveServer and DevelopmentServer for running the Transition Network Drupal sites. It went live in early 2013.
This server is due to be upgraded from LennyToSqueeze on ticket:535 in May 2013.
It was agreed to call this server puffin at the ttech meeting on 22nd November 2012, see ticket:463. The install and initial configuration of this server was tracked on ticket:466, see also the other PuffinServer#migrationtickets. Other services from the old server were migrated to PenguinServer.
Munin Stats
There are munin stats for the server available here
See ticket:555#comment:13 for the notes regarding the installation of the mysql munin stats package.
Load Spikes
The server has been suffering from load spikes which cause the site to be unresponsive for clients, you can see the current status via the puffin Munin load graph, note the Max values for the last day, week, month and year.
When the load hits 3.88 robots are served 403 Forbidden responses and when the load hits 18.88 maintenance tasks are killed and when the load hits 72.2 the server terminates until the 5 min load average falls below 44.4.
The default thresholds have been changed as they were causing the shut to shutdown for 15 min at a time far too often.
The current thresholds are generated from these variables in /root/.barracuda.cnf, the commented out values are the default ones:
#_LOAD_LIMIT_ONE=1444 #_LOAD_LIMIT_TWO=888 _LOAD_LIMIT_ONE=7220 _LOAD_LIMIT_TWO=4440
These variables are used by the /var/xdrago/second.sh script, which is run every minute via cron, which has the following variables in it:
ONEX_LOAD=`awk '{print $1*100}' /proc/loadavg` FIVX_LOAD=`awk '{print $2*100}' /proc/loadavg` CTL_ONEX_SPIDER_LOAD=388 CTL_FIVX_SPIDER_LOAD=388 CTL_ONEX_LOAD=7220 CTL_FIVX_LOAD=4440 CTL_ONEX_LOAD_CRIT=1888 CTL_FIVX_LOAD_CRIT=1555
These values translate to the following loads for comparison to the Munin graphs:
- ONEX_LOAD: load average over the last minute times 100
- FIVX_LOAD: load average over the last 5 minutes times 100
- CTL_ONEX_SPIDER_LOAD: 3.88
- CTL_FIVX_SPIDER_LOAD: 3.88
- CTL_ONEX_LOAD: 72.20
- CTL_FIVX_LOAD: 44.40
- CTL_ONEX_LOAD_CRIT: 18.88
- CTL_FIVX_LOAD_CRIT: 15.55
And the logic, translated into english, is:
- If the load average over the last minute is greater than 3.88 and less than 72.20 and the nginx high load config isn't in use then start to use it.
- Else if the load average over the last 5 mins is greater than 3.88 and less than 44.40 and the nginx high load config isn't in use then start to use it.
- Else if the load average over the last minute is less than 3.88 and the the load average over the last 5 mins is less than 3.88 and the nginx high load config is in use then stop using it.
- If the load average over the last minute is greater than 18.88 and if /var/run/boa_run.pid exists, wait a second, if not kill some maintenance jobs: killall -9 php drush.php wget
- Else if the load average over the last 5 mins is greater than 15.55 then if /var/run/boa_run.pid exists, wait a second, if not kill some maintenance jobs: killall -9 php drush.php wget
- If the load average over the last minute is greater than 72.20 then kill the web server, killall -9 nginx and killall -9 php-fpm php-cgi
- Else if the load average over the last 5 mins is greater than 44.40 then kill the web server, killall -9 nginx and killall -9 php-fpm php-cgi
- Else restart all the services via /var/xdrago/proc_num_ctrl.cgi
Tickets
Most the "live server" tickets relate to puffin, but the older ones are for previous servers.
Current live server tickets
Closed live server tickets
Console Access
There is a Xen shell available for console access, see wiki:XenShell.
Barracuda Octopus Ageir
The server is using Octopus to manage Ageir and also the updates to the Transition Network Drupal site, this system is installed and upgraded using Barracuda, the Barracuda Octopus Aegir combination is documented on the BOA wiki.
The BOA install script output has been saved on ticket:466#comment:22
MariaDB
BOA installs MariaDB as the MySQL server using the debs from the MariaDB site.
We have set MySQL to use a RAM disk for temp tables, see ticket:591
php-fpm
Please note that the version of php-fpm that the http://transitionnetwork.org/ site needs to be running to work properly is:
/etc/init.d/php53-fpm
The config file for it is /opt/local/etc/php53-fpm.conf and when it is running it is listed in top and ps as php-fpm:
ps -lA | grep php 1 S 0 29482 1 0 80 0 - 188067 - ? 00:00:00 php-fpm 5 S 33 29483 29482 2 80 0 - 205351 - ? 00:01:32 php-fpm 5 S 33 29484 29482 2 80 0 - 199726 - ? 00:01:28 php-fpm ...
Please note the settings that we changed from the default BOA ones in /opt/local/etc/php53-fpm.conf below.
When the server boots another version of php-fpm was also started, which is listed in top and ps as php5-fpm, this one:
/etc/init.d/php5-fpm
Which is configured via files in /etc/php5/fpm/. This version should be stopped if it is found to be running:
/etc/init.d/php5-fpm stop
It was stopped from running at runlevel 2 by deleting this symlink (see ticket:560#comment:17):
/etc/rc2.d/S01php5-fpm -> ../init.d/php5-fpm
But that didn't solve the problem, see ticket:580.
Upgrading BOA
The steps are documented in UPGRADE.txt, to upgrade everything run these commands, this process can take around 30 mins:
sudo -i cd wget -q -U iCab http://files.aegir.cc/BOA.sh.txt bash BOA.sh.txt barracuda up-stable octopus up-stable all
nginx config changes
To get the nginx and php-fpm munin stats working the following code starting with the comment needs adding to /var/aegir/config/server_master/nginx.conf in the nginx default server section:
####################################################### ### nginx default server ####################################################### server { limit_conn gulag 32; # like mod_evasive - this allows max 32 simultaneous connections from one IP address listen *:80; server_name _; location / { root /var/www/nginx-default; index index.html index.htm; } ## chris location /nginx_status { stub_status on; access_log off; allow 127.0.0.1; allow 81.95.52.103; deny all; } location ~ ^/(status|ping)$ { fastcgi_pass 127.0.0.1:9090; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_intercept_errors on; include fastcgi_params; access_log off; allow 127.0.0.1; deny all; } }
Logs for analysis on penguin, see wiki:WebServerLogs are generated via the following being added to the http section of the /etc/nginx/nginx.conf file:
# log for awstats log_format apache '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent"'; access_log /var/log/nginx/awstats.log apache;
php-fpm config changes
And the following lines need uncommenting in /opt/local/etc/php53-fpm.conf:
pm.status_path = /status ping.path = /ping
Also the following lines are changed from the default BOA settings (see ticket:555 for the notes on this):
emergency_restart_threshold = 0 emergency_restart_interval = 0 pm.max_children = 90 pm.start_servers = 20 pm.min_spare_servers = 10 pm.max_spare_servers = 20 pm.max_requests = 0
After the edits above have been made nginx and php-fpm need restarting:
/etc/init.d/php53-fpm reload /etc/init.d/nginx restart
Best check the error log:
tail -f /var/log/php/php53-fpm-error.log
These fixes can be tested like this:
cd /etc/munin/plugins munin-run phpfpm_connections munin-run phpfpm_status munin-run nginx_status munin-run nginx_request
mysql config changes
These settings in /etc/mysql/my.cnf are changed from the default and need checking after each upgrade:
max_connections = 75 max_user_connections = 75
xdrago shell script changes
To disable the clobbering of log files two shell scripts need editing and some lines commenting out (see ticket:555#comment:22):
vim /var/xdrago/graceful.sh /var/xdrago/clear.sh :1,$s/echo rotate/# echo rotate/gc
To adjust the restarting of nginx and the killing of nginx and php-fpm under heave loads edit /var/xdrago/second.sh changing these values (see ticket:563#comment:9 and ticket:555#comment:52):
CTL_ONEX_SPIDER_LOAD=1940 CTL_FIVX_SPIDER_LOAD=1940 CTL_ONEX_LOAD=7220 CTL_FIVX_LOAD=4440 CTL_ONEX_LOAD_CRIT=9440 CTL_FIVX_LOAD_CRIT=7775
Upgrade tickets
- BOA-2.0.9 ticket:547
- BOA-2.0.8 ticket:530
- BOA-2.0.7 ticket:529
- ticket:466#comment:26
System Updates
Don't use the regular debian tools for updating packages, do this:
barracuda up-stable system
After running the above command to update the system you also need to follow the steps documented above at PuffinServer#UpgradingBOA for php-fpm to get the Munin stats working again.
See also ticket:548#comment:33 for the steps that need to be followed after this to get BOA to work with the Session443 plugin.
CSF / LDF
To restart the firewall script:
csf -r
False positives
BOA installs CSF / LDF and automatically blocks IP addresses after too many failed SSH login attempts, if someone is blocked who shouldn't be then they can be unblocked like this:
csf -dr 81.95.52.66
To check if a IP address is blocked:
csf -g 81.95.52.66
See this ticket for problems caused by CSF / LDF blocking the monitoring server: ticket:544
Blocklists
Blocklists are configured in /etc/csf/csf.blocklists and some were enabled on ticket:589
Backupninja
backupninja has been installed and configured to backup to another server in the Sheffield colo, three backup tasks have been configured in /etc/backup.d/, 10.sys which does backups of system settings, like all the packages installed, 20.mysql which dumps all the mysql databases into /var/backups/mysql and uses /etc/mysql/debian.cnf for authentication and finally 90.rdiff which is set to backup all these directories:
include = /var/spool/cron/crontabs include = /var/backups include = /var/aegir include = /etc include = /root include = /home include = /usr/local/ include = /var/lib/dpkg/status* include = /opt include = /srv include = /data exclude = /home/*/.gnupg exclude = /home/*/.local/share/Trash exclude = /home/*/.Trash exclude = /home/*/.thumbnails exclude = /home/*/.beagle exclude = /home/*/.aMule exclude = /home/*/gtk-gnutella-downloads exclude = /var/cache/backupninja/duplicity
Postfix
Two changes were made the the default postfix install, it was set to send root emails out, see ticket:466#comment:23 and it was configured to use TLS with the transition network cert, see ticket:466#comment:25.
Nginx
The only changes made to the default nginx configuration was to move the key and cert it was using out of the way and symlink to the *.transitionnetwork.org ones, see ticket:466#comment:25.
Handy commands
There are some Bash aliases to quickly get around the system added by JK...
For root:
alias cdtn='cd /data/disk/tn/' # cd to tn directory alias totn='su -s /bin/bash tn' # log into the tn user # show file usages alias duf='du -sk * | sort -n | perl -ne '\''($s,$f)=split(m{\t});for (qw(K M G)) {if($s<1024) {printf("%.1f",$s);print "$_\t$f"; last};$s=$s/1024}'\'
For tn
alias la='ls -Al --color=auto' alias lc='ls -ltcr --color=auto' alias lk='ls -lSr --color=auto' alias ll='ls -la --group-directories-first --color=auto' alias lr='ls -lR --color=auto' alias ls='ls -hF --color=auto' alias lt='ls -ltr --color=auto' alias lu='ls -ltur --color=auto' alias lx='ls -lXB --color=auto'
Vim config
To make vim the default editor for root the following was added to /root/.bashrc:
export EDITOR="vim"
To make config files nicer to read in vim the following was added to /root/.vimrc:
syntax on
And a /root/.vim/filetype.vim files was created with the following in it:
au BufRead,BufNewFile /etc/mysql/my.cnf, set ft=mycnf autocmd BufRead,BufNewFile /etc/php5/fpm/* set syntax=dosini autocmd BufRead,BufNewFile /opt/local/etc/php53-fpm.conf set syntax=dosini au BufRead,BufNewFile /etc/nginx/*,/etc/nginx/conf.d/*,/var/aegir/config/server_master/nginx/*/* set ft=nginx au BufRead,BufNewFile /data/disk/tn/config/server_master/nginx/vhost.d/* set ft=nginx
And a /root/.vim/syntax/ directory was created and mycnf.vim was created in it by downloading it from http://cvs.pld-linux.org/cgi-bin/cvsweb.cgi/packages/vim-syntax-mycnf/ and nginx.vim was downloaded from http://www.vim.org/scripts/script.php?script_id=1886
Migration Tickets
Tickets created during the migration of the http://www.transitionnetwork.org/ site from NewLiveServer to this server:
- ticket:466 Puffin install and configuration
- ticket:472 Script to copy files from NewLiveServer to puffin
- ticket:479 Transfer live transitionnetwork.org site to puffin
- ticket:480 Transfer news.transitionnetwork.org to puffin
- ticket:483 Nginx 502 Bad Gateway Errors with BOA see the summary on ticket:483#comment:46
- ticket:487 robots.txt files for development sites