Ticket #670 (closed maintenance: fixed)

Opened 3 years ago

Last modified 11 months ago

Roll back performance customisations and use stock BOA settings where possible

Reported by: jim Owned by: jim
Priority: minor Milestone: Maintenance
Component: Live server Keywords:
Cc: ed, chris, jim, sam, planetlarg Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 6.41

Description

Issue
Given so much has changed since the initial issues on the server, I now strongly recommend reverting all settings changes that do not add features back to stock BOA settings after the next BOA release.

These would include all MySQL, PHP, FPM, Redis and other settings that have been for performance reasons, or to combat the situation where there was hardware/IO issues with the underlying server. I'm most interested in FPM and MySQL settings.

The next version of BOA will include some improvements we need (see 629: Upgrades to BOA which should handle load on our server with a lot of CPU cores. With this in place we'll be able to revert more easily to stock settings.

Rationale
I'm not talking about rolling back changes that provide us with features or mission-critical capabilities, just the changes to the subsystems I list above for performance reasons.

It's my belief that these enhancements no longer match the needs of the server since the changes to filesystem and underlying hardware fixes have been completed. They also represent an ongoing risk around updates, future planning -- plus it's possible they might mean Puffin's web services need more memory than it otherwise would, costing TN more than it should need to spend on hardware.

Proposed solution

  1. Await the next version of BOA, and setup the enhanced load settings per the documentation.
  2. Revert all other changes to conf files for MySQL, PHP, FPM, Redis that do not add a feature or are not mission-critical.
  3. Review /root/.barracuda.cnf and turn off any overrides and customisations we don't now need as a result of 2).
  4. Run the BOA BOND.sh script to do the tuning of the server appropriate to the memory requirements. This will tune for the current levels on first pass.
  5. Review Munin and site performance. If we need to make any tweaks then we can do a minimal set as required -- keeping an eye on memory usage.
  6. Once a few days have gone by I would hope that the overall memory use will be lower, OR with more cached data. At this point we can either re-run the barracuda installer with the _RESERVED_RAM set to 1-4 Gb, or simply reduce the memory available to Puffin.
  7. Repeat from 4, using BOND.sh to optimise for the new memory footprint.

Clearly, it's possible no memory savings can be made, or just 1Gb or so is sensible. Either way, rolling back the changes made for a system that has changed immensely is worth attempting to compare current (tweaked) performance to the stock system. Since current settings are all documented and can be backed up, we should be able to test this with no risk and the ability to roll back as needed.

Next steps

  • Chris and Ed to give their thoughts.
  • Ed to green-light before we proceed in too much detail or take any action.
  • Chris and Jim to establish the changes and outcomes.
  • Chris, Jim and whoever to do the new optimisation process.

Attachments

csf.allow.txt (1.1 KB) - added by chris 3 years ago.
/etc/csf/csf.allow
csf.blocklists.txt (3.2 KB) - added by chris 3 years ago.
/etc/csf/csf.blocklists
csf.conf.txt (71.0 KB) - added by chris 3 years ago.
/etc/csf/csf.conf
my.cnf.txt (3.7 KB) - added by chris 3 years ago.
/etc/mysql/my.cnf
nginx.conf.txt (7.5 KB) - added by chris 3 years ago.
/var/aegir/config/server_master/nginx.conf
php53-fpm.conf.txt (20.4 KB) - added by chris 3 years ago.
/opt/local/etc/php53-fpm.conf
second.sh.txt (2.0 KB) - added by chris 3 years ago.
/var/xdrago/second.sh
puffin-2013-01-13-multips_memory-year.png (39.4 KB) - added by chris 3 years ago.
puffin-2013-01-13-phpfpm_average-week.png (30.5 KB) - added by chris 3 years ago.
puffin-2013-01-13-phpfpm_status-week.png (24.5 KB) - added by chris 3 years ago.
puffin-2013-01-13-phpfpm_status-year.png (21.3 KB) - added by chris 3 years ago.
tmp_phpfpm_average-year-1317618564.png (24.7 KB) - added by chris 3 years ago.
puffin-2014-01-15-mysql_qcache_mem-year.png (25.3 KB) - added by chris 3 years ago.
redis.conf.txt (22.7 KB) - added by chris 3 years ago.
/etc/redis/redis.conf
barracuda.cnf.txt (2.0 KB) - added by chris 3 years ago.
/root/.barracuda.cnf
tn_visitors_2014-03-02.png (17.4 KB) - added by chris 2 years ago.

Change History

comment:1 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 0.0 to 0.2

comment:2 Changed 3 years ago by jim

  • Cc jim added

CC me...

comment:3 Changed 3 years ago by chris

On Sat 11-Jan-2014 at 08:51:30PM -0000, Transiton Technology Trac wrote:

Given so much has changed since the initial issues on the server, I now
strongly recommend reverting all settings changes that do not add features
back to stock BOA settings after the next BOA release.

I'm happy to give this a try, the main thing we need to watch for is the number of php-fpm processes.

Last edited 3 years ago by chris (previous) (diff)

comment:4 Changed 3 years ago by chris

Can you post copies of all the files that will be clobbered on the next BOA upgrade to Trac so we have them available for reference.

The main thing that I expect will change is that there will be a dramatic shift of memory allocation away from MySQL and to php-fpm.

Last edited 3 years ago by chris (previous) (diff)

comment:5 Changed 3 years ago by ed

  • Cc sam added
  • Owner changed from ed to jim
  • Status changed from new to assigned

I'm fine with this if it is about how we are needing less specialisations for BOA - particularly around the handover, and watching JK's epic Sherlock impersonation over the weekend on #610 and this will make it more standard, and theefore more handover-able.

Jim and Chris to work together *very very* closely and document the arse of it please.

Adding Sam cc

Changed 3 years ago by chris

/etc/csf/csf.allow

Changed 3 years ago by chris

/etc/csf/csf.blocklists

Changed 3 years ago by chris

/etc/csf/csf.conf

Changed 3 years ago by chris

/etc/mysql/my.cnf

Changed 3 years ago by chris

/var/aegir/config/server_master/nginx.conf

Changed 3 years ago by chris

/opt/local/etc/php53-fpm.conf

Changed 3 years ago by chris

/var/xdrago/second.sh

comment:6 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 2.16
  • Total Hours changed from 0.2 to 2.36

I have attached all the files that I think have been changed from the default BOA settings, but it is possible that I might have missed one or two, Jim can you check this list?

They have been posted here so that when the next BOA upgrade clobbers all these files we will, if needs be, be able to revert the clobbering.

I have also done some updating of wiki:PuffinServer but more is needed.

/etc/csf/csf.allow

A backup of this file has also been created on the server for future diffing:

  • /etc/csf/csf.allow.2013-01-13.bak

In this file we have allowed some specific IP addresses, two for munin (though at the moment only the penguin one is needed):

tcp:in:d=4949:s=81.95.52.102 # munin.webarch.net
tcp:in:d=4949:s=81.95.52.111 # penguin.webarch.net

And we have allowed the Webarchitects monitoring server, this enables email alerts to be sent to the Webarchitects sysadmins when a service which is being monitored goes down:

81.95.52.66 # webarch monitoring server - Manually allowed - Wed Aug  7 10:56:54 2013

See ticket:544 for more on this.

/etc/csf/csf.blocklists

A backup of this file has also been created on the server for future diffing:

  • /etc/csf/csf.blocklists.2013-01-13.bak

In this file we have enabled various blacklists, specifically:

# Spamhaus Don't Route Or Peer List (DROP)
# Details: http://www.spamhaus.org/drop/
SPAMDROP|86400|0|http://www.spamhaus.org/drop/drop.lasso

# Spamhaus Extended DROP List (EDROP)
# Details: http://www.spamhaus.org/drop/
SPAMEDROP|86400|0|http://www.spamhaus.org/drop/edrop.lasso

# DShield.org Recommended Block List
# Details: http://dshield.org
DSHIELD|86400|0|http://feeds.dshield.org/block.txt

# BOGON list
# Details: http://www.team-cymru.org/Services/Bogons/
BOGON|86400|0|http://www.cymru.com/Documents/bogon-bn-agg.txt

# Project Honey Pot Directory of Dictionary Attacker IPs
# Details: http://www.projecthoneypot.org
HONEYPOT|86400|0|http://www.projecthoneypot.org/list_of_ips.php?t=d&rss=1

# BruteForceBlocker IP List
# Details: http://danger.rulez.sk/index.php/bruteforceblocker/
BFB|86400|0|http://danger.rulez.sk/projects/bruteforceblocker/blist.php

# OpenBL.org 30 day List
# Details: http://www.openbl.org
OPENBL|86400|0|http://www.us.openbl.org/lists/base_30days.txt

# Autoshun Shun List
# Details: http://www.autoshun.org/
AUTOSHUN|86400|0|http://www.autoshun.org/files/shunlist.csv

The enabling of these blacklists was done on ticket:589.

/etc/csf/csf.conf

A backup of this file has also been created on the server for future diffing:

  • /etc/csf/csf.conf.2013-01-13.bak

This files has various amendments, the following list is based on doing a diff with the oldest backup, the ones listed are the ones which have either been done before we set _CUSTOM_CONFIG_CSF=YES in /root/.barracuda.cnf and look significant, or ones which have clearly been done manually, this means that not all of these settings will be clobbered with the next BOA upgrade, this is how the diff was done:

cd /etc/csf
diff csf.conf-pre-BOA-2.0.4-121215-1555 csf.conf | vim -

To allow Mosh connections:

# Allow incoming UDP ports
UDP_IN = "20,21,53,123,161,33434:33523,60000:60040"

# Allow outgoing UDP ports
# To allow outgoing traceroute add 33434:33523 to this list
UDP_OUT = "20,21,53,113,123,161,33434:33523,60000:60040"

See ticket:673.

Enable email alerts to be sent to me for monitoring:

LF_ALERT_TO = "chris@webarchitects.co.uk"

X_ARF_TO = "chris@webarchitects.co.uk"

Switch off testing and auto updates:

TESTING = "0"

AUTO_UPDATES = "0"

TCP ports:

TCP_IN = "20,21,22,37,53,80,443,2401,5280,9418,30000:50000"

TCP_OUT = "20,21,22,25,37,53,80,110,143,443,465,587,873,993,995,1129,2401,3306,5280,9418,11371,27017,30000:50000"

Disallow pings:

ICMP_IN = "0"

Ensure that the server isn't vulnerable to a DOS which exploits the behaviour of csf IP blocking:

DENY_IP_LIMIT = "100"

Port flood settings:

SYNFLOOD = "1"

CONNLIMIT = "22;19,80;19,443;19,53;5"

PORTFLOOD = "22;tcp;9;29,1433;tcp;1;900"

Logging:

DROP_OUT_LOGGING = "1"

LOGFLOOD_ALERT = "1"

We are not running a IMAP or POP3 server and we are not using Apache:

LF_POP3D = "0"

LF_IMAPD = "0"

LF_HTACCESS = "0"

LF_MODSEC = "0"

LT_EMAIL_ALERT = "0"

Distributed attack settings:

LF_DISTATTACK = "1"

LF_DISTATTACK_UNIQ = "3"

LF_DISTFTP = "5"

LF_DISTFTP_UNIQ = "5"

LF_DISTFTP_PERM = "900"

Process time tracking:

PT_LIMIT = "0"

User process tracking:

PT_USERPROC = "0"

PT_USERMEM = "0"

PT_USERTIME = "0"

PT_USERKILL_ALERT = "0"

Forkbomb:

PT_FORKBOMB = "250"

Port scan tracking:

PS_INTERVAL = "120"
PS_LIMIT = "19"

User ID tracking:

UID_INTERVAL = "0"
UID_LIMIT = "10"

UID_PORTS = "0:65535,ICMP"

We only have CSF on one server:

CLUSTER_BLOCK = "0"

/etc/mysql/my.cnf

A backup of this file has also been created on the server for future diffing:

  • /etc/mysql/my.cnf.2013-01-13.bak

All the changes to /etc/mysql/my.cnf should be linked from ticket:587, reading through the file these are the ones that stand out:

[mysqld]

tmpdir                  = /run/shm/mysql

join_buffer_size        = 256M

key_buffer_size         = 256M

max_connections         = 40
max_user_connections    = 40

query_cache_limit       = 2M

query_cache_size        = 768M

query_cache_min_res_unit = 1K

sort_buffer_size        = 512K

bulk_insert_buffer_size = 256K

table_open_cache        = 6144

table_definition_cache  = 6144

table_cache             = 20480

tmp_table_size          = 2048M

max_heap_table_size     = 4096M 

max_tmp_tables          = 32768

open_files_limit        = 196608

innodb_buffer_pool_size = 1536M

/var/aegir/config/server_master/nginx.conf

A backup of this file has also been created on the server for future diffing:

  • /var/aegir/config/server_master/nginx.conf.2013-01-13.bak

The changes made to this file are to ename Munin graphs based on Nginx and php-fpm status and they are documented on here: wiki:PuffinServer#nginxconfigchanges

  location /nginx_status {
    stub_status on;
    access_log   off;
    allow 127.0.0.1;
    allow 81.95.52.103;
    deny all;
  }
  location ~ ^/(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    deny all;
  }

/opt/local/etc/php53-fpm.conf

A backup of this file has also been created on the server for future diffing:

  • /opt/local/etc/php53-fpm.conf.2013-01-13.bak

The changes to this file, (see wiki:PuffinServer#php-fpmconfigchanges) relate to enabling the Munin graphs:

pm.status_path = /status

ping.path = /ping

And to reducing the number of php-fpm processes:

pm.start_servers = 4

pm.max_spare_servers = 4

/var/xdrago/second.sh

The changes in this file, (see wiki:PuffinServer#xdragoshellscriptchanges) are to increase the suicide thresholds:

CTL_ONEX_SPIDER_LOAD=2716
CTL_FIVX_SPIDER_LOAD=2716
CTL_ONEX_LOAD=10108
CTL_FIVX_LOAD=6216
CTL_ONEX_LOAD_CRIT=13216
CTL_FIVX_LOAD_CRIT=10885

See ticket:555 for background info.

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

Changed 3 years ago by chris

comment:7 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.35
  • Total Hours changed from 2.36 to 2.71

Memory: MySQL vs php-fpm

I suspect that the key change that reverting the default BOA configs will result in is a dramatic shift of memory allocation away from MySQL to php-fpm.

See the changes from the default setting documented above:

This week (not an unusual week as far as I'm aware) the site has been running with a average of 1.22 php-fpm processes and has spiked to a max of around 12 active processes:


Each process takes around 90MB of RAM (though some of this might be shared between processes?):


Times when the default BOA settings has set the number of php-fpm processes a lot higher (it's currently set to 5) can be seen in this graph:


At thoses times the higher minimum number of php-fpm processes resulted in a higher overall memory usage by php-fpm:


Since, most of the time, there is no need for a lot of php-fpm processes the minimum number of processes has been dramatically reduced and the memory that this has saved has been allocated to MySQL via large increases in the cache settings.

Last edited 3 years ago by chris (previous) (diff)

comment:8 Changed 3 years ago by chris

Oops, all the filenames used for the Munin graphs above should have 2014 in them not 2013...

comment:9 follow-up: ↓ 10 Changed 3 years ago by jim

Good info, thanks Chris... Question: according to the chart, memory per FPM process was at its lowest in March when we commissioned the server and the settings were default -- what's changed between then and now do you think?

I wonder if FPM's usage is shared as you say, or other settings/caches/buffers have an impact on it.

The purpose of this ticket is a sanity check and to establish a) if we need our current customistions, b) if they can be improved, c) if the lessons can be learned and passed back to the BOA project.

Changed 3 years ago by chris

comment:10 in reply to: ↑ 9 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 2.71 to 2.96

Replying to jim:

Question: according to the chart, memory per FPM process was at its lowest in March when we commissioned the server and the settings were default -- what's changed between then and now do you think?

I don't know, looking at the Timeline (note annoying use of US date format)
/trac/timeline?from=06%2F01%2F13&daysback=30&authors=&milestone=on&ticket=on&changeset=on&wiki=on&update=Update it could be the upgrade to BOA 2.0.9?


Changed 3 years ago by chris

Changed 3 years ago by chris

/etc/redis/redis.conf

comment:11 Changed 3 years ago by chris

  • Cc planetlarg added
  • Add Hours to Ticket changed from 0.0 to 0.25
  • Component changed from Unassigned to Live server
  • Total Hours changed from 2.96 to 3.21

Nick added as a CC.

This graph illustrates the additional memory we have allocated to MySQL and the tweaks we have made varying the RAM allocated to the query cache between 1GB and 0.5GB:


/etc/redis/redis.conf

A backup of this file has also been created on the server for future diffing:

  • /etc/redis/redis.conf.2014-01-15.bak

Looking at a diff with the oldest backup in /etc/redis/:

diff redis.conf-pre-BOA-2.0.5-130108-1232 redis.conf | vim -

281c281
< maxmemory 512MB
---
> maxmemory 1024MB

We have doubled the memory available to Redis to 1GB.

maxmemory 1024MB

Changed 3 years ago by chris

/root/.barracuda.cnf

comment:12 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.85
  • Total Hours changed from 3.21 to 4.06

/root/.barracuda.cnf

I have backed up the /root/.barracuda.cnf file to /root/.barracuda.cnf.2014-01-17.bak and removed the _NEWRELIC_KEY variable from /root/.barracuda.cnf as it's no longer needed and attached it here:

In order to reverse all the customisation that we have done I think these are the things we need to look at in /root/.barracuda.cnf before the next BOA update, ticket:629

_XTRAS_LIST="PDS CSF CHV"

The options for this are, listed here http://drupalcode.org/project/barracuda.git/blob/HEAD:/docs/NOTES.txt#l2

Xtras included with "ALL" wildcard:

CGP --- Collectd Graph Panel
CHV --- Chive DB Manager
CSF --- Firewall
CSS --- Compass Tools
FTP --- Pure-FTPd server with forced FTPS
PDS --- Fast DNS Cache Server (pdnsd)
WMN --- Webmin Control Panel

Xtras which need to be listed explicitly:

BDD --- SQL Buddy DB Manager
BND --- Bind9 DNS Server
BZR --- Bazaar
FMG --- FFmpeg support
GIT --- Latest Git from sources
SR1 --- Apache Solr 1 with Jetty 7
SR3 --- Apache Solr 3 with Jetty 8
SR4 --- Apache Solr 4 with Jetty 8 or 9

Is anyone using the Chive DB Manager? It is available here:

Chive is a web interface to MySQL, see http://www.chive-project.com/

If Chive would be useful to people I can add some documentation about it to wiki:PuffinServer.

I'm not sure why we have a FTP server running when we don't have FTP set in _XTRAS_LIST? See the note at the end of ticket:674#comment:4

_AUTOPILOT=YES

We really need to change this to NO as I think this is the cause of the problems with the Debian upgrade, see ticket:535#comment:23

_PHP_FPM_WORKERS=AUTO

If we end up with a lot of unneeded PHP-FPM processes, see ticket:670#Memory:MySQLvsphp-fpm we might want to set this to 4 or so.

_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3

We are using the default versiopn of PHP, see http://drupalcode.org/project/barracuda.git/blob/HEAD:/BARRACUDA.sh.txt the other options are 5.5, 5.4, and 5.2.

_LOAD_LIMIT_ONE=8664
_LOAD_LIMIT_TWO=5328

See wiki:PuffinServer#LoadSpikes for notes on these thresholds.

#_CUSTOM_CONFIG_SQL=NO
_CUSTOM_CONFIG_SQL=YES

We have a customised /etc/mysql/my.cnf, see ticket:670#etcmysqlmy.cnf

#_CUSTOM_CONFIG_PHP_5_3=NO
_CUSTOM_CONFIG_PHP_5_3=YES

We have a customised /opt/local/etc/php53-fpm.conf see ticket:670#optlocaletcphp53-fpm.conf

_SYSTEM_UPGRADE_ONLY=YES

Should this be set to NO? As it is it will prevent cause the skipping of Aegir Master Instance upgrades, see http://drupalcode.org/project/barracuda.git/blob/HEAD:/BARRACUDA.sh.txt#l346

_SQUEEZE_TO_WHEEZY=YES

This can be changed to NO since we are on Wheezy now.

Anything else I have missed?

comment:13 follow-up: ↓ 14 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 4.06 to 4.11

Chris, my time is running out so I'm going to leave this in your capable hands -- though if there are specific questions or tasks I'll answer/do them when they come up.

All the above looks good... The only thing we've added to the BOA setup (though it's not a change from stock as BOA supports this) within the Aegir/Drupal? world is the /data/conf/override.global.inc file to do some Session 443 and developer tweaks.

So I'm presently happy with this if you are.

Last edited 3 years ago by jim (previous) (diff)

comment:14 in reply to: ↑ 13 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 4.11 to 4.36

Replying to jim:

So I'm presently happy with this if you are.

I'm happy that we have documented all the tweaks that have been made and I'm happy to see what the default settings would be, but I expect after the next BOA upgrade we will need to redo these changes:

But I'd be happy to find that the above tweaks were not needed.

comment:15 Changed 3 years ago by chris

Reviewing the changes we need to make to /root/.barracuda.cnf prior to tonights upgrade, see ticket:707.

I think we can change this:

#_XTRAS_LIST="PDS CSF CHV"
_XTRAS_LIST="PDS CSF"

As we don't need Chive do we?

CHV --- Chive DB Manager

Note that:

### Note that removing any item from this
### list once it is already installed, will
### NOT uninstall anything.

So Chive need manually uninstalling, on the other hand the upgrade will result in:

  • Use Two-Factor-like Authentication logic for Chive DB Manager access.

Which will make it a lot more secure as someone will need to ping their IP address from the server before they can use Chive, but since people with ssh access can use the !MySQL command line I'm still not sure Chive is needed?

I'll include it for now as the removal process would be additional work.

The full list of options is here: http://drupalcode.org/project/barracuda.git/blob/HEAD:/BARRACUDA.sh.txt

This has been changed to No:

_CUSTOM_CONFIG_PHP_5_3=NO
#_CUSTOM_CONFIG_PHP_5_3=YES

I haven't changed this:

_CUSTOM_CONFIG_CSF=YES

As I really think it would be a waste of time to redo all the tweaks to the firewall, see ticket:670#comment:6

This has been changed:

#_SYSTEM_UPGRADE_ONLY=YES
_SYSTEM_UPGRADE_ONLY=NO

This has been changed:

#_BUILD_FROM_SRC=YES
_BUILD_FROM_SRC=NO

This has been changed:

_CUSTOM_CONFIG_SQL=NO
#_CUSTOM_CONFIG_SQL=YES

But I expect we will want to change this back to YES and use the existing !MySQL config as a lot of time has been invested in it.

This has been changed as we are using Wheezy:

#_SQUEEZE_TO_WHEEZY=YES
_SQUEEZE_TO_WHEEZY=NO

This is the resulting updated file:

###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
### NOTE: the group of settings displayed bellow will *not* be overriden
### on upgrade by the Barracuda script nor by this configuration file.
### They can be defined only on initial Barracuda install.
###
_HTTP_WILDCARD=YES
_MY_OWNIP="81.95.52.103"
#_MY_OWNIP=""
_MY_HOSTN="puffin.webarch.net"
#_MY_HOSTN=""
_MY_FRONT="master.puffin.webarch.net"
_THIS_DB_HOST=localhost
#_THIS_DB_HOST=FQDN
_SMTP_RELAY_TEST=YES
_SMTP_RELAY_HOST=""
_LOCAL_NETWORK_IP=""
_LOCAL_NETWORK_HN=""
###
### NOTE: the group of settings displayed bellow
### will *override* all listed settings in the Barracuda script,
### both on initial install and upgrade.
###
_MY_EMAIL="chris@webarchitects.co.uk"
_XTRAS_LIST="PDS CSF CHV"
_AUTOPILOT=NO
_DEBUG_MODE=NO
_DB_SERVER=MariaDB
_SSH_PORT=22
_LOCAL_DEBIAN_MIRROR="ftp.debian.org"
_LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
_FORCE_GIT_MIRROR=""
_DNS_SETUP_TEST=YES
_NGINX_EXTRA_CONF=""
_NGINX_WORKERS=AUTO
_PHP_FPM_WORKERS=AUTO
#_BUILD_FROM_SRC=YES
_BUILD_FROM_SRC=NO
_PHP_MODERN_ONLY=YES
_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3
#_LOAD_LIMIT_ONE=1444
#_LOAD_LIMIT_TWO=888
_LOAD_LIMIT_ONE=8664
_LOAD_LIMIT_TWO=5328
_CUSTOM_CONFIG_CSF=YES
_CUSTOM_CONFIG_SQL=NO
#_CUSTOM_CONFIG_SQL=YES
_CUSTOM_CONFIG_REDIS=NO
_CUSTOM_CONFIG_PHP_5_2=NO
_CUSTOM_CONFIG_PHP_5_3=NO
#_CUSTOM_CONFIG_PHP_5_3=YES
_SPEED_VALID_MAX=3600
_NGINX_DOS_LIMIT=300
#_SYSTEM_UPGRADE_ONLY=YES
_SYSTEM_UPGRADE_ONLY=NO
_USE_MEMCACHED=NO
_NEWRELIC_KEY=
_USE_STOCK=NO
###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
_EXTRA_PACKAGES=
_PHP_EXTRA_CONF=""
_STRONG_PASSWORDS=NO
_DB_BINARY_LOG=NO
_DB_ENGINE=InnoDB
_NGINX_LDAP=NO
_PHP_GEOS=NO
_PHP_MONGODB=NO
_AEGIR_UPGRADE_ONLY=NO
### Squeeze to Wheezy upgrade config
### See /trac/ticket/535
#_SQUEEZE_TO_WHEEZY=YES
_SQUEEZE_TO_WHEEZY=NO
_NGINX_FORWARD_SECRECY=YES
_NGINX_SPDY=YES
#_BUILD_FROM_SRC=NO 
_NGINX_NAXSI=NO
_PHP_ZEND_OPCACHE=YES
_PERMISSIONS_FIX=YES
_MODULES_FIX=YES
_MODULES_SKIP=""
_SSL_FROM_SOURCES=NO
_SSH_FROM_SOURCES=NO
_RESERVED_RAM=0

In the description of this ticket Jim suggests:

Run the BOA BOND.sh script to do the tuning of the server appropriate to the memory requirements.

There isn't a copy of this on the server, so:

cd /usr/local/bin
lynx -dump -source https://raw.githubusercontent.com/omega8cc/boa/master/aegir/tools/BOND.sh.txt > BOND.sh
chmod 750 BOND.sh

Try running it:

Tuner [Mon Mar 31 14:03:52 BST 2014] ==> INFO: This script is ran as a root user
Tuner [Mon Mar 31 14:03:52 BST 2014] ==> ERROR: This script should be used only when the same version of BARRACUDA was used before
Tuner [Mon Mar 31 14:03:52 BST 2014] ==> Your system has to be configured/upgraded by BARRACUDA version BOA-2.2.0 first
Tuner [Mon Mar 31 14:03:52 BST 2014] ==> Bye

So this is something to do after the upgrade.

The time spent on this comment has been recorded on ticket:707#comment:5

comment:16 follow-up: ↓ 17 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 4.36 to 4.41

Hi Chris, a few answers:

  • We do need Chive, it's very useful to have in the background where it costs us nothing, so please leave it enabled.
  • CSF changes are necessary so yes, keep those customisations.
  • The MySQL changes may or may not be needed post update, and their settings should be managed with a view to reducing the overal memory allocation for the entire VM... 8Gb is a lot, I would wager we can get by with 4-6Gb given the enhancements provided by the Zend Opcache and BOA 2.2.0... So for me these settings are certainly candidates for 'back to stock' unless there's a proven reason not to -- what do you think?

comment:17 in reply to: ↑ 16 ; follow-up: ↓ 19 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 4.41 to 4.66

Replying to jim:

  • We do need Chive, it's very useful to have in the background where it costs us nothing, so please leave it enabled.

OK.

  • CSF changes are necessary so yes, keep those customisations.

OK.

  • The MySQL changes may or may not be needed post update, and their settings should be managed with a view to reducing the overal memory allocation for the entire VM... 8Gb is a lot, I would wager we can get by with 4-6Gb given the enhancements provided by the Zend Opcache and BOA 2.2.0... So for me these settings are certainly candidates for 'back to stock' unless there's a proven reason not to -- what do you think?

!MySQL using 2.5GB of RAM at the moment, see:

It has 768M of data in the query cache:

The dumped database size is 221M.

I would expect to see the performance reduce if the amount of RAM available to MySQL is reduced, but I'm happy to test this assumption if needs be.

I'm not sure 8GB of RAM for the server is a lot given the size and complexity of the site and traffic it gets. For reference these are the latest bandwidth stats from Xen:

 puffin  /  monthly

       month        rx      |     tx      |    total    |   avg. rate
    ------------------------+-------------+-------------+---------------
      Apr '13     68.61 GiB |   14.06 GiB |   82.66 GiB |  267.52 kbit/s
      May '13     65.49 GiB |   22.61 GiB |   88.10 GiB |  275.92 kbit/s
      Jun '13     68.12 GiB |   16.18 GiB |   84.31 GiB |  272.85 kbit/s
      Jul '13    113.14 GiB |   21.98 GiB |  135.12 GiB |  423.18 kbit/s
      Aug '13    124.42 GiB |   17.20 GiB |  141.62 GiB |  443.56 kbit/s
      Sep '13    139.33 GiB |   13.78 GiB |  153.10 GiB |  495.49 kbit/s
      Oct '13    143.35 GiB |   13.97 GiB |  157.32 GiB |  492.72 kbit/s
      Nov '13    121.11 GiB |   12.47 GiB |  133.57 GiB |  432.29 kbit/s
      Dec '13    112.36 GiB |   10.83 GiB |  123.19 GiB |  385.82 kbit/s
      Jan '14    133.04 GiB |   15.02 GiB |  148.06 GiB |  463.72 kbit/s
      Feb '14    110.55 GiB |   10.57 GiB |  121.13 GiB |  420.01 kbit/s
      Mar '14    113.76 GiB |   10.79 GiB |  124.54 GiB |  395.56 kbit/s
    ------------------------+-------------+-------------+---------------
    estimated    115.36 GiB |   10.94 GiB |  126.30 GiB |

But I guess we could try reducing it by 1 or 2GB, wiki:PenguinServer really could do with more, so it could be moved there:

If the RAM is reduced the time it would be noticed the most would be when there are traffic spikes I expect.

According to the Piwik stats the biggest traffic spike this year was on 14th Feb with 3.2k visitors and 5.5k page views (note this excludes bots, people with JS disabled and people with Do Not Track headers set).

comment:18 Changed 3 years ago by chris

Last night the server was updated to the latest BOA and this morning the server went down, see ticket:707#comment:23 and my first impression is that we are now back to the load spike suicide situation wiki:PuffinServer#LoadSpikes

Last edited 3 years ago by chris (previous) (diff)

comment:19 in reply to: ↑ 17 Changed 3 years ago by chris

Replying to chris:

I would expect to see the performance reduce if the amount of RAM available to MySQL is reduced

The amount of RAM available to MySQL has been halved by the BOA upgrade, for more detail see ticket:587#comment:11 but I'm not sure how to measure if this has made things faster or slower, it looks like the slow query log is no longer generated -- there are no stats anymore:

comment:20 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 4.66 to 5.16

This is the /root/.barracuda.cnf after the upgrade to BOA 2.2.2:

###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
### NOTE: the group of settings displayed bellow will *not* be overriden
### on upgrade by the Barracuda script nor by this configuration file.
### They can be defined only on initial Barracuda install.
###
_HTTP_WILDCARD=YES
_MY_OWNIP="81.95.52.103"
#_MY_OWNIP=""
_MY_HOSTN="puffin.webarch.net"
#_MY_HOSTN=""
_MY_FRONT="master.puffin.webarch.net"
_THIS_DB_HOST=localhost
#_THIS_DB_HOST=FQDN
_SMTP_RELAY_TEST=YES
_SMTP_RELAY_HOST=""
_LOCAL_NETWORK_IP=""
_LOCAL_NETWORK_HN=""
###
### NOTE: the group of settings displayed bellow
### will *override* all listed settings in the Barracuda script,
### both on initial install and upgrade.
###
_MY_EMAIL="chris@webarchitects.co.uk"
_XTRAS_LIST="PDS CSF CHV"
_AUTOPILOT=NO
_DEBUG_MODE=NO
_DB_SERVER=MariaDB
_SSH_PORT=22
_LOCAL_DEBIAN_MIRROR="ftp.debian.org"
_LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
_FORCE_GIT_MIRROR=""
_DNS_SETUP_TEST=YES
_NGINX_EXTRA_CONF=""
_NGINX_WORKERS=AUTO
_PHP_FPM_WORKERS=AUTO
_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3
_CUSTOM_CONFIG_CSF=YES
_CUSTOM_CONFIG_SQL=NO
#_CUSTOM_CONFIG_SQL=YES
_CUSTOM_CONFIG_REDIS=NO
_CUSTOM_CONFIG_PHP_5_2=NO
_CUSTOM_CONFIG_PHP_5_3=NO
#_CUSTOM_CONFIG_PHP_5_3=YES
_SPEED_VALID_MAX=3600
_NGINX_DOS_LIMIT=300
#_SYSTEM_UPGRADE_ONLY=YES
_SYSTEM_UPGRADE_ONLY=NO
_NEWRELIC_KEY=
_USE_STOCK=NO
###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
_EXTRA_PACKAGES=
_PHP_EXTRA_CONF=""
_STRONG_PASSWORDS=YES
_DB_BINARY_LOG=NO
_DB_ENGINE=InnoDB
_NGINX_LDAP=NO
_PHP_GEOS=NO
_PHP_MONGODB=NO
_AEGIR_UPGRADE_ONLY=NO
### Squeeze to Wheezy upgrade config
### See /trac/ticket/535
#_SQUEEZE_TO_WHEEZY=YES
_SQUEEZE_TO_WHEEZY=NO
_NGINX_FORWARD_SECRECY=YES
_NGINX_SPDY=YES
_NGINX_NAXSI=NO
_PERMISSIONS_FIX=YES
_MODULES_FIX=YES
_MODULES_SKIP=""
_SSL_FROM_SOURCES=NO
_SSH_FROM_SOURCES=NO
_RESERVED_RAM=0
_PHP_MULTI_INSTALL="5.3"
_CUSTOM_CONFIG_LSHELL=NO
_CUSTOM_CONFIG_PHP55=NO
_CUSTOM_CONFIG_PHP54=NO
_CUSTOM_CONFIG_PHP53=NO
_CUSTOM_CONFIG_PHP52=NO
_CPU_SPIDER_RATIO=3
_CPU_MAX_RATIO=6
_CPU_CRIT_RATIO=9
_PHP_FPM_DENY=""
_REDIS_LISTEN_MODE=PORT
_STRICT_BIN_PERMISSIONS=YES

Jim has suggested on the Ttech list:

switch Redis port to 'socket' which is recommended from 'port'.

So this line has been changed:

_REDIS_LISTEN_MODE=SOCKET

Time recorded on this tick includes time spend looking at Munin stats, responding the email on the Ttech list and a phone call with Ed.

comment:22 follow-up: ↓ 23 Changed 2 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 5.16 to 6.16

Since we have "Rolled back performance customisations and use stock BOA settings where possible" the server has been having load spikes, some of these have been so big that they would have triggered a server suicide had the previous xdrago shell scripts been in place, in the last week we have had these spikes, the following lines are from the email subject lines from the server alerts:

May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.29  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 4.80  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.35  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 4.38  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.46  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.33  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 4.54  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 19.42 
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 60.34 
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 29.69 
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 11.17 
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 4.22  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.05  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 5.95  
May 25 lfd on puffin.webarch.net: High 1 minute load average alert - 3.37  

May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 4.11  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 6.55  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 9.45  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.89  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 4.75  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.80  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 4.29  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.21  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.03  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 20.50 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 11.40 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 4.29  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 5.24  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 7.05  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 14.50 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 8.59  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 30.44 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 11.62 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 4.30  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.35  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 14.88 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 5.62  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 25.06 
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 8.90  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.69  
May 26 lfd on puffin.webarch.net: High 1 minute load average alert - 3.60  

May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 3.76  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.73  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 3.77  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.69  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 3.66  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 4.50  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.94  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.69  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 27.10 
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 9.60  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 3.78  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.64  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 4.73  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 8.57
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 43.91
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 20.11 
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 8.14  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 3.28  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.13  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 5.25  
May 27 lfd on puffin.webarch.net: High 1 minute load average alert - 4.33  

May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.03  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 6.34  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.38  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.46  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.09  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 6.03  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 7.16  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 12.05 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 4.72  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.12  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 76.28 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 101.63
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 72.41 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 10.77 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 4.20  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 6.52  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 6.32  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 8.05  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 3.17  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 6.97  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 5.34  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 4.22  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 4.58  
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 66.02 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 92.50 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 35.54 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 13.53 
May 28 lfd on puffin.webarch.net: High 1 minute load average alert - 5.29  

May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 3.62  
May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 6.81  
May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 3.67  
May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 3.37
May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 3.19
May 29 lfd on puffin.webarch.net: High 1 minute load average alert - 3.83  

May 30 lfd on puffin.webarch.net: High 1 minute load average alert - 3.20  
May 30 lfd on puffin.webarch.net: High 1 minute load average alert - 3.24  
May 30 lfd on puffin.webarch.net: High 1 minute load average alert - 4.14  
May 30 lfd on puffin.webarch.net: High 1 minute load average alert - 3.27  
May 30 lfd on puffin.webarch.net: High 1 minute load average alert - 7.16  

May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 3.55  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 5.68  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 4.02  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 3.39  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 3.01  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 4.10  
May 31 lfd on puffin.webarch.net: High 1 minute load average alert - 3.20  

Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 3.19  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 6.67  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 5.98  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 4.01  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 3.66  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 3.92  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 6.25  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 4.72  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 3.82  
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 11.84 
Jun 01 lfd on puffin.webarch.net: High 1 minute load average alert - 5.10  

Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 3.15  
Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 3.36  
Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 4.68  
Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 3.24  
Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 6.86  
Jun 02 lfd on puffin.webarch.net: High 1 minute load average alert - 3.49  

The ones below 14 are not something to worry about -- the server has 14 CPUs, these are the concerning ones:

  • Sun, 25 May 2014 09:54:00 - 60.34
  • Mon, 26 May 2014 06:25:56 - 20.50
  • Mon, 26 May 2014 15:58:24 - 30.44
  • Mon, 26 May 2014 18:45:39 - 25.06
  • Tue, 27 May 2014 06:25:47 - 27.10
  • Tue, 27 May 2014 08:47:04 - 43.91
  • Wed, 28 May 2014 11:05:00 - 101.63
  • Wed, 28 May 2014 13:52:26 - 92.50

I suspect that if we renistate the memory allocation that was removed from MySQL by the "Roll back performance customisations and use stock BOA settings where possible" policy -- MySQL memory was reduced by 50%, see ticket:587#comment:11 and ticket:707#comment:39 -- then there is a chance that these spikes would be dramatically reduced. Ed / Jim -- are you willing to give this a try?

In trac:ticket/707#comment:5 it was suggested that:

After the upgrade has been done this should be run: /usr/local/bin/BOND.sh

This hadn't been done, so:

/usr/local/bin/BOND.sh
Tuner [Mon Jun  2 11:38:14 BST 2014] ==> INFO: This script is ran as a root user
Tuner [Mon Jun  2 11:38:14 BST 2014] ==> ERROR: This script should be used only when the same version of BARRACUDA was used before
Tuner [Mon Jun  2 11:38:14 BST 2014] ==> Your system has to be configured/upgraded by BARRACUDA version BOA-2.2.0 first
Tuner [Mon Jun  2 11:38:14 BST 2014] ==> Bye

Not sure what that means exactly...

The old load spike suicide documentation has been archived to wiki:PuffinServerBoaLoadSpikes.


Last edited 2 years ago by chris (previous) (diff)

Changed 2 years ago by chris

comment:23 in reply to: ↑ 22 Changed 2 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 6.16 to 6.41

Replying to chris:

these are the concerning ones:

  • Sun, 25 May 2014 09:54:00 - 60.34
  • Mon, 26 May 2014 06:25:56 - 20.50
  • Mon, 26 May 2014 15:58:24 - 30.44
  • Mon, 26 May 2014 18:45:39 - 25.06
  • Tue, 27 May 2014 06:25:47 - 27.10
  • Tue, 27 May 2014 08:47:04 - 43.91
  • Wed, 28 May 2014 11:05:00 - 101.63
  • Wed, 28 May 2014 13:52:26 - 92.50

It's worth noting that these coincide with increases in 50x errors, these are the subject lines from the wiki:ErrorCodeCheck email for the same period:

May 25 - 6211 403, 3013 404, 0 502, 4 503 and 0 504 errors from puffin.webarch.net
May 26 - 6267 403, 4386 404, 0 502, 16 503 and 0 504 errors from puffin.webarch.net
May 27 - 5825 403, 3593 404, 0 502, 19 503 and 0 504 errors from puffin.webarch.net
May 28 - 5544 403, 3296 404, 0 502, 20 503 and 0 504 errors from puffin.webarch.net
May 29 - 5866 403, 2685 404, 0 502, 100 503 and 7 504 errors from puffin.webarch.net
May 30 - 5155 403, 2619 404, 0 502, 0 503 and 0 504 errors from puffin.webarch.net
May 31 - 5148 403, 2487 404, 0 502, 0 503 and 0 504 errors from puffin.webarch.net
Jun 01 - 5197 403, 2380 404, 0 502, 0 503 and 0 504 errors from puffin.webarch.net
Jun 02 - 4953 403, 2503 404, 0 502, 0 503 and 0 504 errors from puffin.webarch.net

The 403's are mostly blocked bots, the 404's are mostly links to the old wiki pages, it's the 503 and 504's which are probably related to the spikes.

Note that the script that greps for the errors in the Nginx logs runs with logrotate, so the errors numbers above for 29th May relate to the load spike on 28th May.

The number of people visiting the site, as recorded by PiwikServer wasn't significantly higher than usual:


comment:24 follow-up: ↓ 25 Changed 2 years ago by ed

Re: removing the standard settings to a customised set up: Sam is going to arrange a Ttech skype to discuss Paul's success with Aegir publishing where we will take stock of how it was, how to do it next etc. I suggest that we talk about this then - and come up with a clear proposal. How about that?

comment:25 in reply to: ↑ 24 Changed 2 years ago by chris

Replying to ed:

Re: removing the standard settings to a customised set up

All I'm basically suggesting is that we change the MySQL settings so that the number of connections isn't maxed out all the time (see https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/mysql_connections.html - there is no slack any more) and so that the query cache is bigger, see these comments regarding the memory use being reduced by 50% with the default MySQL settings: ticket:587#comment:11 and ticket:707#comment:39.

Would you like me to list all lines in my.cnf that I'm suggesting are changed?

I suggest that we talk about this then - and come up with a clear proposal. How about that?

OK.

comment:26 Changed 2 years ago by ed

No point showing me lines of code, Chris, best place for this is in the BOA/Aegir meet I reckon so that everyone there hears and discuses it.

comment:27 in reply to: ↑ description Changed 11 months ago by chris

  • Status changed from assigned to closed
  • Resolution set to fixed

Replying to jim:

Given so much has changed since the initial issues on the server, I now strongly recommend reverting all settings changes that do not add features back to stock BOA settings after the next BOA release.

With hindsight this was a terrible suggestion, we should have ditched BOA many years ago -- commenting out all the BOA root cron jobs appears to have solved all the problems we have had over the years with load spikes, see wiki:PuffinServer#LoadSpikes -- so closing this ticket.

Note: See TracTickets for help on using tickets.