Ticket #707 (closed maintenance: fixed)

Opened 3 years ago

Last modified 3 years ago

Upgrade to BOA-2.2.2

Reported by: chris Owned by: chris
Priority: critical Milestone: Maintenance
Component: Live server Keywords:
Cc: ed, jim Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 10.87

Description

I have created a new ticket for this as I have found having one ticket (see ticket:629) for all BOA upgrades makes it really hard to review past upgrades.

Upgrades from BOA-2.0.7 to BOA-2.1.1 did have their own tickets, see wiki:PuffinServer#Upgradetickets and unless there is a convincing reason not to have one ticket per upgrade I'd rather do it like this.

Jim has pointed out on the Ttech list that:

the v2.2.0 changelog is up as of a few days ago:
http://drupalcode.org/project/barracuda.git/blob/HEAD:/CHANGELOG.txt

The Changelog starts:

  • Stable BOA-2.2.0 Release - Full Edition
  • Date: TBD
  • Includes Aegir 2.x-boa-custom version.
  • Release Notes:

There are many important changes and improvements in this release you should be aware of *before* running your BOA system upgrade.

Even if you are on a hosted BOA system with upgrades managed for you, it is very important to read at least this extensive release notes.

And if you are more curious, read also the big changelog further below, which covers only a small number of over 530 commits since BOA-2.1.3

I have yet to read the rest of the Changelog.

There is also a task to copy the proposed changes to the BOA configuration in ticket:629 over to this ticket.

Should people other than chris and ed be CC's for this ticket?

Attachments

puffin_2014-04-03_redis_dbs-day.png (22.5 KB) - added by chris 3 years ago.
puffin_2014-04-16_load-week.png (24.0 KB) - added by chris 3 years ago.

Change History

comment:1 Changed 3 years ago by ed

Jim most definitely. This is important. I'm adding him now. And probably Paul; tbc; I have an email out with him asking if he'll take more of a lead on code publishing as it's not working for Sam - so let's see if Paul will also go cc on this.

comment:2 Changed 3 years ago by ed

  • Cc jim added

comment:3 Changed 3 years ago by chris

Email from wiki:PuffinServer:

There is new BOA-2.2.0 Stable Edition available.

Please review the changelog and upgrade as soon as possible
to receive all security updates and new features.

Changelog: http://bit.ly/newboa

I'll do the upgrade tonight.

comment:4 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

Reading through the Changelog, these issues are ones which effect us:


Custom php.ini protection has changed and will not honor old settings

If you have custom settings in any of your php.ini files protected with
old variable in the /root/.barracuda.cnf, make a backup of your ini files
before running this upgrade. While these files will not get overwritten,
they will no longer be used, because we have introduced new, standardized
directory structure to properly support multi-PHP-versions systems.

Respective php.ini files are now located in /opt/phpXX/etc/phpXX.ini
for FPM and /opt/phpXX/lib/php.ini for CLI, where XX is 55, 54, 53 or 52,
depending on the versions listed via _PHP_MULTI_INSTALL variable in the
/root/.barracuda.cnf file. Also the variables used to protect ini files
from being overwritten have changed to _CUSTOM_CONFIG_PHPXX.

If you need any non-standard settings in any of active ini files, don't
overwrite them with the old files, but rather carefully review and apply
only the differences you need.


All PHP FPM workers in 5.5, 5.4 and 5.3 now use the 'ondemand' mode

This change will help to better manage memory use, especially on systems with
multiple PHP versions running in parallel. This will also free resources
and allocate them dynamically only when requests are coming and only to
the active FPM pools. Note that the 'ondemand' mode doesn't affect Zend
OPcache, because it is managed by the parent process(es) which stay(s) active.

The net result is that on a vanilla BOA install, without non-hostmaster sites
running, the complete stack consumes just ~200 MB of RAM (in total, so with
MariaDB, Redis and Nginx etc. included) with all three PHP-FPM versions
running in parallel: 5.5, 5.4 and 5.3:


But I don't think these will require any action on our part, they just address things we were manually fixing. Our documentation will need updating after the upgrade, wiki:PuffinServer.

Last edited 3 years ago by chris (previous) (diff)

comment:5 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 0.25 to 1.25

Reviewing the discussion on ticket:629 these are the issues we need to be aware of when doing the BOA upgrade tonight:

php-fpm status

See ticket:629#comment:32, this addresses:

Will change to:

And /etc/munin/plugin-conf.d/munin-node will need updating to:

[phpfpm*]
env.url http://127.0.0.1/php-status

barracuda config

Reviewing the discussion on ticket:670, this is the current /root/.barracuda.cnf:

###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
### NOTE: the group of settings displayed bellow will *not* be overriden
### on upgrade by the Barracuda script nor by this configuration file.
### They can be defined only on initial Barracuda install.
###
_HTTP_WILDCARD=YES
_MY_OWNIP="81.95.52.103"
#_MY_OWNIP=""
_MY_HOSTN="puffin.webarch.net"
#_MY_HOSTN=""
_MY_FRONT="master.puffin.webarch.net"
_THIS_DB_HOST=localhost
#_THIS_DB_HOST=FQDN
_SMTP_RELAY_TEST=YES
_SMTP_RELAY_HOST=""
_LOCAL_NETWORK_IP=""
_LOCAL_NETWORK_HN=""
###
### NOTE: the group of settings displayed bellow
### will *override* all listed settings in the Barracuda script,
### both on initial install and upgrade.
###
_MY_EMAIL="chris@webarchitects.co.uk"
_XTRAS_LIST="PDS CSF CHV"
_AUTOPILOT=NO
_DEBUG_MODE=NO
_DB_SERVER=MariaDB
_SSH_PORT=22
_LOCAL_DEBIAN_MIRROR="ftp.debian.org"
_LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
_FORCE_GIT_MIRROR=""
_DNS_SETUP_TEST=YES
_NGINX_EXTRA_CONF=""
_NGINX_WORKERS=AUTO
_PHP_FPM_WORKERS=AUTO
_BUILD_FROM_SRC=YES
_PHP_MODERN_ONLY=YES
_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3
#_LOAD_LIMIT_ONE=1444
#_LOAD_LIMIT_TWO=888
_LOAD_LIMIT_ONE=8664
_LOAD_LIMIT_TWO=5328
_CUSTOM_CONFIG_CSF=YES
#_CUSTOM_CONFIG_SQL=NO
_CUSTOM_CONFIG_SQL=YES
_CUSTOM_CONFIG_REDIS=NO
_CUSTOM_CONFIG_PHP_5_2=NO
#_CUSTOM_CONFIG_PHP_5_3=NO
_CUSTOM_CONFIG_PHP_5_3=YES
_SPEED_VALID_MAX=3600
_NGINX_DOS_LIMIT=300
_SYSTEM_UPGRADE_ONLY=YES
_USE_MEMCACHED=NO
_NEWRELIC_KEY=
_USE_STOCK=NO
###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
_EXTRA_PACKAGES=
_PHP_EXTRA_CONF=""
_STRONG_PASSWORDS=NO
_DB_BINARY_LOG=NO
_DB_ENGINE=InnoDB
_NGINX_LDAP=NO
_PHP_GEOS=NO
_PHP_MONGODB=NO
_AEGIR_UPGRADE_ONLY=NO
### Squeeze to Wheezy upgrade config
### See /trac/ticket/535
_SQUEEZE_TO_WHEEZY=YES
_NGINX_FORWARD_SECRECY=YES
_NGINX_SPDY=YES
#_BUILD_FROM_SRC=NO 
_NGINX_NAXSI=NO
_PHP_ZEND_OPCACHE=YES
_PERMISSIONS_FIX=YES
_MODULES_FIX=YES
_MODULES_SKIP=""
_SSL_FROM_SOURCES=NO
_SSH_FROM_SOURCES=NO
_RESERVED_RAM=0

See ticket:670#comment:15 for the notes about the changes to this file, this is what it has now been updated to:

###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
### NOTE: the group of settings displayed bellow will *not* be overriden
### on upgrade by the Barracuda script nor by this configuration file.
### They can be defined only on initial Barracuda install.
###
_HTTP_WILDCARD=YES
_MY_OWNIP="81.95.52.103"
#_MY_OWNIP=""
_MY_HOSTN="puffin.webarch.net"
#_MY_HOSTN=""
_MY_FRONT="master.puffin.webarch.net"
_THIS_DB_HOST=localhost
#_THIS_DB_HOST=FQDN
_SMTP_RELAY_TEST=YES
_SMTP_RELAY_HOST=""
_LOCAL_NETWORK_IP=""
_LOCAL_NETWORK_HN=""
###
### NOTE: the group of settings displayed bellow
### will *override* all listed settings in the Barracuda script,
### both on initial install and upgrade.
###
_MY_EMAIL="chris@webarchitects.co.uk"
_XTRAS_LIST="PDS CSF CHV"
_AUTOPILOT=NO
_DEBUG_MODE=NO
_DB_SERVER=MariaDB
_SSH_PORT=22
_LOCAL_DEBIAN_MIRROR="ftp.debian.org"
_LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
_FORCE_GIT_MIRROR=""
_DNS_SETUP_TEST=YES
_NGINX_EXTRA_CONF=""
_NGINX_WORKERS=AUTO
_PHP_FPM_WORKERS=AUTO
#_BUILD_FROM_SRC=YES
_BUILD_FROM_SRC=NO
_PHP_MODERN_ONLY=YES
_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3
#_LOAD_LIMIT_ONE=1444
#_LOAD_LIMIT_TWO=888
_LOAD_LIMIT_ONE=8664
_LOAD_LIMIT_TWO=5328
_CUSTOM_CONFIG_CSF=YES
_CUSTOM_CONFIG_SQL=NO
#_CUSTOM_CONFIG_SQL=YES
_CUSTOM_CONFIG_REDIS=NO
_CUSTOM_CONFIG_PHP_5_2=NO
_CUSTOM_CONFIG_PHP_5_3=NO
#_CUSTOM_CONFIG_PHP_5_3=YES
_SPEED_VALID_MAX=3600
_NGINX_DOS_LIMIT=300
#_SYSTEM_UPGRADE_ONLY=YES
_SYSTEM_UPGRADE_ONLY=NO
_USE_MEMCACHED=NO
_NEWRELIC_KEY=
_USE_STOCK=NO
###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
_EXTRA_PACKAGES=
_PHP_EXTRA_CONF=""
_STRONG_PASSWORDS=NO
_DB_BINARY_LOG=NO
_DB_ENGINE=InnoDB
_NGINX_LDAP=NO
_PHP_GEOS=NO
_PHP_MONGODB=NO
_AEGIR_UPGRADE_ONLY=NO
### Squeeze to Wheezy upgrade config
### See /trac/ticket/535
#_SQUEEZE_TO_WHEEZY=YES
_SQUEEZE_TO_WHEEZY=NO
_NGINX_FORWARD_SECRECY=YES
_NGINX_SPDY=YES
#_BUILD_FROM_SRC=NO 
_NGINX_NAXSI=NO
_PHP_ZEND_OPCACHE=YES
_PERMISSIONS_FIX=YES
_MODULES_FIX=YES
_MODULES_SKIP=""
_SSL_FROM_SOURCES=NO
_SSH_FROM_SOURCES=NO
_RESERVED_RAM=0

After the upgrade has been done this should be run: /usr/local/bin/BOND.sh

comment:6 follow-up: ↓ 7 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.05
  • Total Hours changed from 1.25 to 1.3

I notice this tweet by @omega8cc:

We are working on some Known Issues affecting systems upgraded to BOA-2.2.0 release: http://bit.ly/1rXl2ND #Drupal #Aegir

Which points to this section of the change log:

# Known Issues on systems upgraded to BOA-2.2.0 release (work in progress)

==> Updated on Mon Mar 31 19:37:24 SGT 2014.

  • Compass Tools don't use correct paths to Ruby 2.1.1
  • Chive Authentication via SSH session doesn't work on some older instances.
  • PHP: Disabled 'create_function' may break some contrib modules or code.
  • The drush @foo.com generate-makefile command may not work on some systems.

So I know you're keen to get PHP and NginX updated ASAP, but I think it'd pay to wait until later this week to do the update -- more issues/tweaks will almost certainly crop up.

FWIW I tend do do my system BOA update between 1 and 2 weeks after the release as in the past I've only had to do it again a few days later... And I like to spend <1h per month dicking around with my server ideally!

Your call, obvs, but it might generate extra faff by going 'early'.

comment:7 in reply to: ↑ 6 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.3 to 1.4

Replying to jim:

  • PHP: Disabled 'create_function' may break some contrib modules or code.

Is the above an issue for us?

It would be nice to get the update done in this fiancial year...

comment:8 follow-up: ↓ 9 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 1.4 to 1.55

It could be... Without testing I don't know. Unfortunately we're in the PHP5.2-based world of Drupal 6, so the risk is much higher...

This search for create_function within drupalcontrib.org brings back 91 results including references to Views Bulk Operations and Pathologic on the first 2 pages, both of which we use.

So I'd rate this risk as 'high' on this one, unfortunately...

comment:9 in reply to: ↑ 8 Changed 3 years ago by chris

Replying to jim:

It could be... Without testing I don't know. Unfortunately we're in the PHP5.2-based world of Drupal 6, so the risk is much higher...

That includes 5.3?

So I'd rate this risk as 'high' on this one, unfortunately...

OK, lets leave it till next month sometime.

comment:10 Changed 3 years ago by chris

FWIW they have just tweeted:

We have fixed 3 Known Issues in BOA-2.2.0 http://t.co/LjUlFkl8q7 #Aegir #Drupal

comment:11 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 1.55 to 1.7

The oustanding issues are not ones which affect us AFAIK:

==> Updated on Mon Mar 31 12:39:35 EDT 2014

  @=> Issues hot-fixed in stable (run 'barracuda up-stable system' to apply):

  * Compass Tools don't use correct paths to Ruby 2.1.1
  * Chive Authentication via SSH session doesn't work on some older instances.
  * PHP: Disabled 'create_function' may break some contrib modules or code.

  @=> Issues waiting for a fix:

  * The 'git pull' command is broken in limited shell.
  * The drush @foo.com generate-makefile command may not work on some systems.
Last edited 3 years ago by chris (previous) (diff)

comment:12 Changed 3 years ago by chris

  • Summary changed from Upgrade to BOA-2.2.0 to Upgrade to BOA-2.2.1

BOA-2.2.1 is now out:

We are happy to release BOA-2.2.1 Full Edition, which includes only bug fixes to address a few issues discovered after recent major BOA-2.2.0 Release.

### Stable BOA-2.2.1 Release - Full Edition
### Date: Tue Apr 1 10:28:45 SGT 2014
### Includes Aegir 2.x-boa-custom version.

# Release Notes:

This is a bug-fix only release to address issues discovered after recent
major BOA-2.2.0 Release.

# Fixes in this release:

  • Chive Authentication via SSH session doesn't work on some older instances.
  • Compass Tools don't use correct paths to Ruby 2.1.1
  • Cron for sites doesn't work on old instances without Nginx wildcard vhost.
  • FTPS (FTP over SSL) connections may experience TLS problems.
  • PHP: Disabled 'assert' may cause warnings on features revert.
  • PHP: Disabled 'create_function' may break some contrib modules or code.
  • The 'git pull' command is broken in limited shell.
  • The 'rsync' command is broken in limited shell.
  • The 'drush dl foo' command can't be run outside of site directory.

You can read the full changelog as always at: http://bit.ly/newboa

https://omega8.cc/boa-221-full-edition-305

Changed 3 years ago by chris

comment:13 Changed 3 years ago by chris

It's been noted on ticket:604 that the site is always slow first thing in the morning and that this is probably due to the Redis cache being "reset" at midnight each night:


According to the BOA maintainers this "feature" should be fixed in the new BOA version.

The BOA-2.2.1 CHANGELOG.txt contains:

433 * Redis: Integration module (the modern variant) upgrade to 7.x-2.x-o8-2.6-A
434 * Redis: Use modern version with enabled fast lock and aggressive flush mode.

And:

546 * Redis: Auto-Restart if socket is missing only when socket mode is enabled.
547 * Redis: Exclude cache_form bin or it will break modules like ajax_comments.
548 * Redis: Force clean restart daily, with long enough sleep time.
549 * Redis: Restore pwd protection.
550 * Redis: The cache_metatag bin needs aggressive flush mode -- see #2062379

comment:14 Changed 3 years ago by jim

FWIW I did the update last night barracuda up-stable followed by octopus up-stable all on Babylon and it all went very well. Took about 1/2 an hour.

comment:15 follow-up: ↓ 17 Changed 3 years ago by jim

And the chart Chris posted 2 comments up is actually more showing Drupal clearing its page caches every 12 hours, rather than the 3-4am system tasks. The latter is represented by a small dip in stored data, but the big drops are the 12-hourly Drupal 'system' cron tasks...

These were once an hour, [https://tech.transitionnetwork.org/trac/ticket/590#comment:37 now 12 hourly as part of work on 590 (part M) in the 'cleanup' Elysia cron job.

This remains a limitation with Drupal 6's caching infrastructure, though one I think the Redis module maintainers appear to have attempted to mitigate: https://drupal.org/node/1875584 <-- hopefully this comes along with 2.2.1...

We can open a ticket to follow up this aspect another time if necessary.

comment:16 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 1.7 to 1.8

comment:17 in reply to: ↑ 15 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 1.8 to 1.95

Replying to jim:

the chart Chris posted 2 comments up is actually more showing Drupal clearing its page caches every 12 hours

Ah, I had missed that it is every 12 hours, but it is also the case that Redis is killed and restarted at around ten past midnight each night, this can be seen in the /var/log/redis/redis-server.log:

[59475] 07 Apr 00:10:04.156 # User requested shutdown...
[59475] 07 Apr 00:10:05.699 # Redis is now ready to exit, bye bye...
[20917] 07 Apr 00:10:06.777 # Server started, Redis version 2.6.16

This is caused by the /var/xdrago/mysql_backup.sh script which contains:

/etc/init.d/redis-server stop
killall -9 redis-server
rm -f /var/run/redis.pid
rm -f /var/lib/redis/*
/etc/init.d/redis-server start
echo "Redis server restarted"

And this is set to run via this root crontab:

08 0 * * * bash /var/xdrago/mysql_backup.sh >/dev/null 2>&1

This isn't how Redis is designed to work:

Redis is designed to be a very long running process in your server.

http://redis.io/topics/admin

But as Jim has pointed out the effect of Drupal clearing it's cache seems to be the main cause of the Redis cache being emptied.

Replying to jim:

FWIW I did the update last night barracuda up-stable followed by octopus up-stable all on Babylon and it all went very well.

Do you think it would be safe to update Puffin to the latest BOA, or should we wait some more?

comment:18 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.95 to 2.2
  • Summary changed from Upgrade to BOA-2.2.1 to Upgrade to BOA-2.2.2

I'm tempted to do the upgrade tonight... or should we wait some more... Jim?

New version of BOA, from the CHANGELOG.txt, note the heartbleed issues are being addressed on ticket:692#comment:18.

### Stable BOA-2.2.2 Release - Barracuda Edition
### Date: Tue Apr  8 07:24:18 PDT 2014
### Includes Aegir 2.x-boa-custom version.

# Release Notes:

  This is a bug-fix only release to address issues discovered after recent
  major BOA-2.2.0 Release and subsequent BOA-2.2.1 release.

  The most important problem fixed in this Release is related to known OpenSSL
  security issue, which has been fixed in OpenSSL 1.0.1g

  To learn more please visit: http://heartbleed.com

  @=> Note for those on self-hosted BOA (skip this if you are on a hosted Aegir)

  We recommend that you enable _SSL_FROM_SOURCES=YES option in your system
  /root/.barracuda.cnf file, to always build latest OpenSSL from sources.
  Note that it will also trigger OpenSSH and cURL install from sources, plus
  subsequent PHP rebuild to include latest SSL libraries.

  This Release doesn't include any updates to the Octopus installer, so there is
  no point in running full upgrade. It is enough to run the barracuda only,
  system upgrade in the "silent mode" with:

  $ screen
  $ barracuda up-stable system

  The system will send you an e-mail with results when the upgrade is complete,
  but there will be no upgrade progress displayed in the console. You can watch
  it, if you prefer, with command (DATE/TIME are placeholders for real values):

  $ tail -f /var/backups/reports/up/barracuda/DATE/barracuda-up-DATE-TIME.log

# System upgrades in this release:

  * Nginx 1.5.13
  * OpenSSL 1.0.1g (if installed from sources)
  * PHP 5.4.27
  * PHP 5.5.11

# Fixes in this release:

  * Chive Authentication via SSH session may break Nginx due to race conditions.
  * Drush specific dt() wrapper is required in Provision for custom platforms.
  * Fix Compass Tools support for Omega (gems dependencies via bundle install).
  * Fix default shell for system level cron tasks.
  * Fix for csf firewall compatibility test.
  * Force better health check on protected vhosts on live SSH-auth update.
  * Issue #2229555 - On fresh boa install link missing durring install.
  * Issue #2229715 - Tasks queue doesn't work on the Master Instance.
  * Issue #2231093 - Add new line before 'UseDNS no' in the sshd_config file.
  * Issue #294 - New Relic ext not installed even if _NEWRELIC_KEY is not empty.
  * Nginx: Backup and re-create default wildcard SSL cert/key with rsa:4096
  * Nginx: Generate 4096 bit long DH parameters when _NGINX_FORWARD_SECRECY=YES
  * PHP: Better default workers limits for the ondemand mode.
  * PHP: max_input_time should be set to 180 and not 60, by default.
  * PHP: Zend OPcache directive opcache.enable=1 must be set in all ini files.
  * The 'scp' command is broken in limited shell.
  * Too broad whitelisting breaks commands in limited shell with 'tmp' keyword.
  * Too restrictive open_basedir defaults break access to valid PEAR paths.
  * Too restrictive open_basedir defaults break access to valid Tika paths.
  * Use rsa:4096 by default in self-signed certs for Nginx and FTPS.

I don't think we should do this, I think we are better off using the Debian packages for OpenSSL and OpenSSH:

We recommend that you enable _SSL_FROM_SOURCES=YES option in your system
/root/.barracuda.cnf file, to always build latest OpenSSL from sources.
Note that it will also trigger OpenSSH and cURL install from sources, plus
subsequent PHP rebuild to include latest SSL libraries.

comment:19 Changed 3 years ago by chris

Going to do the BOA upgrade now.

comment:20 follow-up: ↓ 24 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.64
  • Total Hours changed from 2.2 to 3.84

I have just changed this setting from NO, due to recent events.

_STRONG_PASSWORDS=YES

Here is the /root/.barracuda.cnf:

###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
### NOTE: the group of settings displayed bellow will *not* be overriden
### on upgrade by the Barracuda script nor by this configuration file.
### They can be defined only on initial Barracuda install.
###
_HTTP_WILDCARD=YES
_MY_OWNIP="81.95.52.103"
#_MY_OWNIP=""
_MY_HOSTN="puffin.webarch.net"
#_MY_HOSTN=""
_MY_FRONT="master.puffin.webarch.net"
_THIS_DB_HOST=localhost
#_THIS_DB_HOST=FQDN
_SMTP_RELAY_TEST=YES
_SMTP_RELAY_HOST=""
_LOCAL_NETWORK_IP=""
_LOCAL_NETWORK_HN=""
###
### NOTE: the group of settings displayed bellow
### will *override* all listed settings in the Barracuda script,
### both on initial install and upgrade.
###
_MY_EMAIL="chris@webarchitects.co.uk"
_XTRAS_LIST="PDS CSF CHV"
_AUTOPILOT=NO
_DEBUG_MODE=NO
_DB_SERVER=MariaDB
_SSH_PORT=22
_LOCAL_DEBIAN_MIRROR="ftp.debian.org"
_LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
_FORCE_GIT_MIRROR=""
_DNS_SETUP_TEST=YES
_NGINX_EXTRA_CONF=""
_NGINX_WORKERS=AUTO
_PHP_FPM_WORKERS=AUTO
#_BUILD_FROM_SRC=YES
_BUILD_FROM_SRC=NO
_PHP_MODERN_ONLY=YES
_PHP_FPM_VERSION=5.3
_PHP_CLI_VERSION=5.3
#_LOAD_LIMIT_ONE=1444
#_LOAD_LIMIT_TWO=888
_LOAD_LIMIT_ONE=8664
_LOAD_LIMIT_TWO=5328
_CUSTOM_CONFIG_CSF=YES
_CUSTOM_CONFIG_SQL=NO
#_CUSTOM_CONFIG_SQL=YES
_CUSTOM_CONFIG_REDIS=NO
_CUSTOM_CONFIG_PHP_5_2=NO
_CUSTOM_CONFIG_PHP_5_3=NO
#_CUSTOM_CONFIG_PHP_5_3=YES
_SPEED_VALID_MAX=3600
_NGINX_DOS_LIMIT=300
#_SYSTEM_UPGRADE_ONLY=YES
_SYSTEM_UPGRADE_ONLY=NO
_USE_MEMCACHED=NO
_NEWRELIC_KEY=
_USE_STOCK=NO
###
### Configuration created on 121215-1545
### with Barracuda version BOA-2.0.4
###
_EXTRA_PACKAGES=
_PHP_EXTRA_CONF=""
_STRONG_PASSWORDS=YES
_DB_BINARY_LOG=NO
_DB_ENGINE=InnoDB
_NGINX_LDAP=NO
_PHP_GEOS=NO
_PHP_MONGODB=NO
_AEGIR_UPGRADE_ONLY=NO
### Squeeze to Wheezy upgrade config
### See /trac/ticket/535
#_SQUEEZE_TO_WHEEZY=YES
_SQUEEZE_TO_WHEEZY=NO
_NGINX_FORWARD_SECRECY=YES
_NGINX_SPDY=YES
#_BUILD_FROM_SRC=NO 
_NGINX_NAXSI=NO
_PHP_ZEND_OPCACHE=YES
_PERMISSIONS_FIX=YES
_MODULES_FIX=YES
_MODULES_SKIP=""
_SSL_FROM_SOURCES=NO
_SSH_FROM_SOURCES=NO
_RESERVED_RAM=0

Following the notes, wiki:PuffinServer#UpgradingBOA

sudo -i
screen
cd
wget -q -U iCab http://files.aegir.cc/BOA.sh.txt
bash BOA.sh.txt

  BOA Meta Installer setup completed
  Please check INSTALL.txt and UPGRADE.txt at http://bit.ly/boa-docs for how-to
  Bye

barracuda up-stable

  Another BOA installer is running probably - /var/run/boa_run.pid exists

ls -lah /var/run/boa_run.pid

  -rw-r--r-- 1 root root 0 Mar 31 14:03 /var/run/boa_run.pid

rm /var/run/boa_run.pid
barracuda up-stable
 
  Barracuda [Fri Apr 11 21:55:47 BST 2014] ==> BOA Skynet welcomes you aboard!

  Barracuda [Fri Apr 11 21:55:51 BST 2014] ==> INFO: UPGRADE
  Barracuda [Fri Apr 11 21:55:51 BST 2014] ==> INFO: Reading your /root/.barracuda.cnf config file
  Barracuda [Fri Apr 11 21:55:52 BST 2014] ==> NOTE! Please review all config options displayed below
  Barracuda [Fri Apr 11 21:55:52 BST 2014] ==> NOTE! It will *override* all settings in the Barracuda script
  Barracuda [Fri Apr 11 21:55:53 BST 2014] ==> Legacy PHP-CLI 5.2 is not used on this system
  Barracuda [Fri Apr 11 21:55:53 BST 2014] ==> Legacy PHP-FPM 5.2 is not used on this system
  
  ###
  ### Configuration created on 121215-1545
  ### with Barracuda version BOA-2.0.4
  ###
  ### NOTE: the group of settings displayed bellow will *not* be overriden
  ### on upgrade by the Barracuda script nor by this configuration file.
  ### They can be defined only on initial Barracuda install.
  ###
  _HTTP_WILDCARD=YES
  _MY_OWNIP="81.95.52.103"
  #_MY_OWNIP=""
  _MY_HOSTN="puffin.webarch.net"
  #_MY_HOSTN=""
  _MY_FRONT="master.puffin.webarch.net"
  _THIS_DB_HOST=localhost
  #_THIS_DB_HOST=FQDN
  _SMTP_RELAY_TEST=YES
  _SMTP_RELAY_HOST=""
  _LOCAL_NETWORK_IP=""
  _LOCAL_NETWORK_HN=""
  ###
  ### NOTE: the group of settings displayed bellow
  ### will *override* all listed settings in the Barracuda script,
  ### both on initial install and upgrade.
  ###
  _MY_EMAIL="chris@webarchitects.co.uk"
  _XTRAS_LIST="PDS CSF CHV"
  _AUTOPILOT=NO
  _DEBUG_MODE=NO
  _DB_SERVER=MariaDB
  _SSH_PORT=22
  _LOCAL_DEBIAN_MIRROR="ftp.debian.org"
  _LOCAL_UBUNTU_MIRROR="archive.ubuntu.com"
  _FORCE_GIT_MIRROR=""
  _DNS_SETUP_TEST=YES
  _NGINX_EXTRA_CONF=""
  _NGINX_WORKERS=AUTO
  _PHP_FPM_WORKERS=AUTO
  _PHP_FPM_VERSION=5.3
  _PHP_CLI_VERSION=5.3
  _CUSTOM_CONFIG_CSF=YES
  _CUSTOM_CONFIG_SQL=NO
  #_CUSTOM_CONFIG_SQL=YES
  _CUSTOM_CONFIG_REDIS=NO
  _CUSTOM_CONFIG_PHP_5_2=NO
  _CUSTOM_CONFIG_PHP_5_3=NO
  #_CUSTOM_CONFIG_PHP_5_3=YES
  _SPEED_VALID_MAX=3600
  _NGINX_DOS_LIMIT=300
  #_SYSTEM_UPGRADE_ONLY=YES
  _SYSTEM_UPGRADE_ONLY=NO
  _NEWRELIC_KEY=
  _USE_STOCK=NO
  ###
  ### Configuration created on 121215-1545
  ### with Barracuda version BOA-2.0.4
  ###
  _EXTRA_PACKAGES=
  _PHP_EXTRA_CONF=""
  _STRONG_PASSWORDS=YES
  _DB_BINARY_LOG=NO
  _DB_ENGINE=InnoDB
  _NGINX_LDAP=NO
  _PHP_GEOS=NO
  _PHP_MONGODB=NO
  _AEGIR_UPGRADE_ONLY=NO
  ### Squeeze to Wheezy upgrade config
  ### See /trac/ticket/535
  #_SQUEEZE_TO_WHEEZY=YES
  _SQUEEZE_TO_WHEEZY=NO
  _NGINX_FORWARD_SECRECY=YES
  _NGINX_SPDY=YES
  _NGINX_NAXSI=NO
  _PERMISSIONS_FIX=YES
  _MODULES_FIX=YES
  _MODULES_SKIP=""
  _SSL_FROM_SOURCES=NO
  _SSH_FROM_SOURCES=NO
  _RESERVED_RAM=0
  _PHP_MULTI_INSTALL="5.3"
  _CUSTOM_CONFIG_LSHELL=NO
  _CUSTOM_CONFIG_PHP55=NO
  _CUSTOM_CONFIG_PHP54=NO
  _CUSTOM_CONFIG_PHP53=NO
  _CUSTOM_CONFIG_PHP52=NO
  _CPU_SPIDER_RATIO=3
  _CPU_MAX_RATIO=6
  _CPU_CRIT_RATIO=9
  _PHP_FPM_DENY=""
  _REDIS_LISTEN_MODE=PORT
  _STRICT_BIN_PERMISSIONS=YES
  
  Do you want to proceed with the upgrade? [Y/n] Y

  Barracuda [Fri Apr 11 21:56:48 BST 2014] ==> INFO: Checking your system version...
   
  Barracuda [Fri Apr 11 21:56:49 BST 2014] ==> Aegir on Debian/wheezy - Skynet Agent v.BOA-2.2.2
   
  Barracuda [Fri Apr 11 21:56:49 BST 2014] ==> INFO: Updating packages sources list...
  Barracuda [Fri Apr 11 21:56:49 BST 2014] ==> INFO: We will use Debian mirror ftp.debian.org
  Barracuda [Fri Apr 11 21:57:03 BST 2014] ==> INFO: Downloading little helpers...
  Barracuda [Fri Apr 11 21:57:04 BST 2014] ==> INFO: Checking BARRACUDA version...
  Barracuda [Fri Apr 11 21:57:04 BST 2014] ==> INFO: BARRACUDA version test: OK
   
  Barracuda [Fri Apr 11 21:57:05 BST 2014] ==> UPGRADE START -> checkpoint: 
  
    * Your e-mail address appears to be chris@webarchitects.co.uk - is that correct?
    * Your server hostname is puffin.webarch.net.
    * Your Aegir control panel is/will be available at https://master.puffin.webarch.net.
  
   
  Do you want to proceed with the upgrade? [Y/n] Y

  Barracuda [Fri Apr 11 21:57:45 BST 2014] ==> INFO: Cleaning up temp files in /var/opt/
  Barracuda [Fri Apr 11 21:57:45 BST 2014] ==> INFO: Installing extra Drush versions
  Barracuda [Fri Apr 11 21:57:45 BST 2014] ==> INFO: Drush mini-4-14-03-2014 installation complete
  Barracuda [Fri Apr 11 21:57:46 BST 2014] ==> INFO: Drush mini-6-01-04-2014 installation complete
  Barracuda [Fri Apr 11 21:57:52 BST 2014] ==> INFO: Running aptitude update...
  Barracuda [Fri Apr 11 21:58:39 BST 2014] ==> INFO: Upgrading required libraries and tools
  Barracuda [Fri Apr 11 21:58:39 BST 2014] ==> NOTE! This step may take a few minutes, please wait...
  Barracuda [Fri Apr 11 21:59:39 BST 2014] ==> INFO: Testing Nginx version...
  Barracuda [Fri Apr 11 21:59:39 BST 2014] ==> INFO: Installed Nginx version nginx/1.5.7, upgrade required
  Barracuda [Fri Apr 11 21:59:40 BST 2014] ==> INFO: Upgrading Nginx...
  Barracuda [Fri Apr 11 22:00:54 BST 2014] ==> INFO: Running aptitude full-upgrade, please wait...
  Barracuda [Fri Apr 11 22:01:54 BST 2014] ==> INFO: Testing Nginx version...
  Barracuda [Fri Apr 11 22:01:54 BST 2014] ==> INFO: Installed Nginx version nginx/1.5.13, OK
  Barracuda [Fri Apr 11 22:01:54 BST 2014] ==> INFO: Installing MySecureShell 1.32...
  Barracuda [Fri Apr 11 22:02:22 BST 2014] ==> INFO: Installing /usr/bin/wkhtmltopdf x86_64 version...
  Barracuda [Fri Apr 11 22:02:28 BST 2014] ==> INFO: Installing /usr/bin/wkhtmltoimage x86_64 version...
  Barracuda [Fri Apr 11 22:02:34 BST 2014] ==> INFO: Fix #1 for libs in Debian wheezy
  Barracuda [Fri Apr 11 22:02:35 BST 2014] ==> INFO: Checking SMTP connections...
  Barracuda [Fri Apr 11 22:02:35 BST 2014] ==> INFO: Installing VnStat monitor...
  Barracuda [Fri Apr 11 22:02:44 BST 2014] ==> INFO: Upgrading a few more tools...
  Barracuda [Fri Apr 11 22:02:46 BST 2014] ==> INFO: Checking if PHP upgrade is available
  Barracuda [Fri Apr 11 22:02:53 BST 2014] ==> INFO: PHP EXTRA is --with-ldap --with-gmp
  Barracuda [Fri Apr 11 22:02:53 BST 2014] ==> INFO: PHP 5.3.28 will be installed now
  Barracuda [Fri Apr 11 22:02:53 BST 2014] ==> INFO: Installing PHP-FPM 5.3.28
  Barracuda [Fri Apr 11 22:02:53 BST 2014] ==> NOTE! This step may take longer than 8 minutes, please wait...
  Barracuda [Fri Apr 11 22:03:03 BST 2014] ==> INFO: Installing PHP-FPM 5.3.28 part 1/3
  Barracuda [Fri Apr 11 22:03:04 BST 2014] ==> INFO: Installing PHP-FPM 5.3.28 part 2/3
  Barracuda [Fri Apr 11 22:04:59 BST 2014] ==> INFO: Installing PHP-FPM 5.3.28 part 3/3
  Barracuda [Fri Apr 11 22:17:39 BST 2014] ==> INFO: Installing Zend OPcache for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:18:02 BST 2014] ==> INFO: Installing PhpRedis for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:18:23 BST 2014] ==> INFO: Installing UploadProgress for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:18:34 BST 2014] ==> INFO: Installing JSMin for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:18:46 BST 2014] ==> INFO: Installing Imagick for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:19:09 BST 2014] ==> INFO: Installing MailParse for PHP-FPM 5.3.28...
  Barracuda [Fri Apr 11 22:19:23 BST 2014] ==> INFO: Installing IonCube x86_64 version for PHP-FPM...
  Barracuda [Fri Apr 11 22:19:27 BST 2014] ==> INFO: Upgrading Limited Shell to version 0.9.16.5-om8...
  Barracuda [Fri Apr 11 22:19:30 BST 2014] ==> INFO: Installed Redis version 2.6.16, upgrade required
  Barracuda [Fri Apr 11 22:19:30 BST 2014] ==> INFO: Installing Redis update for Debian/wheezy...
  Barracuda [Fri Apr 11 22:20:41 BST 2014] ==> INFO: Generating random password for Redis server
  Barracuda [Fri Apr 11 22:20:42 BST 2014] ==> INFO: Updating MariaDB and PHP configuration
  Barracuda [Fri Apr 11 22:20:43 BST 2014] ==> INFO: Running MySQLTuner check on all databases...
  Barracuda [Fri Apr 11 22:20:43 BST 2014] ==> NOTE! This step may take a LONG time, please wait...
  Barracuda [Fri Apr 11 22:20:47 BST 2014] ==> INFO: OS and services upgrade completed
   
  Barracuda [Fri Apr 11 22:20:47 BST 2014] ==> INFO: Restarting MariaDB server, please wait...
  Barracuda [Fri Apr 11 22:21:05 BST 2014] ==> INFO: Upgrading MariaDB tables if necessary, please wait a minute...
   
  Do you want to upgrade Aegir Master Instance? [Y/n]  Y
  Barracuda [Fri Apr 11 22:24:01 BST 2014] ==> INFO: Running Aegir Master Instance upgrade
  Barracuda [Fri Apr 11 22:24:02 BST 2014] ==> INFO: Syncing provision backend db_passwd...
  Barracuda [Fri Apr 11 22:24:04 BST 2014] ==> INFO: Running hosting-dispatch (1/3)...
  Barracuda [Fri Apr 11 22:24:17 BST 2014] ==> INFO: Running hosting-dispatch (2/3)...
  Barracuda [Fri Apr 11 22:24:24 BST 2014] ==> INFO: Running hosting-dispatch (3/3)...
  Barracuda [Fri Apr 11 22:24:24 BST 2014] ==> INFO: Syncing hostmaster frontend db_passwd...
  Barracuda [Fri Apr 11 22:24:25 BST 2014] ==> INFO: Testing previous install...
  Barracuda [Fri Apr 11 22:24:25 BST 2014] ==> INFO: Test OK, we can proceed with Hostmaster upgrade
  Barracuda [Fri Apr 11 22:24:25 BST 2014] ==> INFO: Moving old directories
  Barracuda [Fri Apr 11 22:24:25 BST 2014] ==> INFO: Downloading drush...
  Barracuda [Fri Apr 11 22:24:26 BST 2014] ==> INFO: Drush seems to be functioning properly
  Barracuda [Fri Apr 11 22:24:26 BST 2014] ==> INFO: Installing provision backend in /var/aegir/.drush
  Barracuda [Fri Apr 11 22:24:26 BST 2014] ==> INFO: Downloading Drush and Provision extensions...
  Barracuda [Fri Apr 11 22:24:26 BST 2014] ==> INFO: Running hostmaster-migrate, please wait...
  Barracuda [Fri Apr 11 22:24:55 BST 2014] ==> INFO: Syncing hostmaster frontend db_passwd...
  Barracuda [Fri Apr 11 22:25:33 BST 2014] ==> INFO: Aegir Master Instance upgrade completed
   
  Barracuda [Fri Apr 11 22:25:37 BST 2014] ==> INFO: Upgrading Chive MariaDB Manager...
  Barracuda [Fri Apr 11 22:25:42 BST 2014] ==> INFO: Restarting Redis, PHP-FPM and Nginx
  Barracuda [Fri Apr 11 22:25:51 BST 2014] ==> INFO: Restarting MariaDB server
   
  Barracuda [Fri Apr 11 22:26:01 BST 2014] ==> INFO: New secure random password for MariaDB generated and updated
  Barracuda [Fri Apr 11 22:26:01 BST 2014] ==> INFO: New entry added to /var/log/barracuda_log.txt
  Barracuda [Fri Apr 11 22:26:01 BST 2014] ==> INFO: Cleaning up system swap, it may take a moment, please wait...
   
  Barracuda [Fri Apr 11 22:26:40 BST 2014] ==> CARD: Now charging your credit card for this auto-upgrade magic...
  Barracuda [Fri Apr 11 22:26:46 BST 2014] ==> JOKE: Just kidding! Enjoy your Aegir Hosting System :)
   
  Barracuda [Fri Apr 11 22:26:46 BST 2014] ==> Final post-upgrade cleaning, please wait a moment...
  Barracuda [Fri Apr 11 22:33:40 BST 2014] ==> BYE!
  
  BARRACUDA upgrade completed
  Bye

While the update was running I was sent this email:

From: root@puffin.webarch.net
Date: Fri, 11 Apr 2014 21:57:27 +0100 (BST)
To: chris@webarchitects.co.uk
Subject: lfd on puffin.webarch.net: System Integrity checking detected a modified system file

Time:     Fri Apr 11 21:57:27 2014 +0000

The following list of files have FAILED the md5sum comparison test. This means that the file has been changed in some way. This could be a result of an OS update or application upgrade. If the change is unexpected it should be investigated:

/usr/bin/7z: FAILED
/usr/bin/7za: FAILED
/usr/bin/Magick-config: FAILED
/usr/bin/MagickCore-config: FAILED
/usr/bin/MagickWand-config: FAILED
/usr/bin/Wand-config: FAILED
/usr/bin/add-patch: FAILED
/usr/bin/anytopnm: FAILED
/usr/bin/apt-key: FAILED
/usr/bin/aptitude-fast: FAILED
/usr/bin/autoconf2.13: FAILED
/usr/bin/autoconf2.50: FAILED
/usr/bin/autoheader2.13: FAILED
/usr/bin/autopoint: FAILED
/usr/bin/autoreconf2.13: FAILED
/usr/bin/autoupdate2.13: FAILED
/usr/bin/bashbug: FAILED
/usr/bin/batch: FAILED
/usr/bin/bison.yacc: FAILED
/usr/bin/c89: FAILED
/usr/bin/c89-gcc: FAILED
/usr/bin/c99: FAILED
/usr/bin/c99-gcc: FAILED
/usr/bin/catchsegv: FAILED
/usr/bin/checkbashisms: FAILED
/usr/bin/compile_et: FAILED
/usr/bin/conkeror: FAILED
/usr/bin/crypt: FAILED
/usr/bin/curl-config: FAILED
/usr/bin/dcmd: FAILED
/usr/bin/debconf-updatepo: FAILED
/usr/bin/debsign: FAILED
/usr/bin/dehtmldiff: FAILED
/usr/bin/dpkg-maintscript-helper: FAILED
/usr/bin/dscextract: FAILED
/usr/bin/dumphint: FAILED
/usr/bin/dvipdf: FAILED
/usr/bin/edit-patch: FAILED
/usr/bin/eps2eps: FAILED
/usr/bin/fakeroot: FAILED
/usr/bin/fakeroot-sysv: FAILED
/usr/bin/fakeroot-tcp: FAILED
/usr/bin/font2c: FAILED
/usr/bin/freetype-config: FAILED
/usr/bin/gcore: FAILED
/usr/bin/gdbtui: FAILED
/usr/bin/getbuildlog: FAILED
/usr/bin/gettext.sh: FAILED
/usr/bin/gettextize: FAILED
/usr/bin/glib-gettextize: FAILED
/usr/bin/gpg-error-config: FAILED
/usr/bin/gpg-zip: FAILED
/usr/bin/gsbj: FAILED
/usr/bin/gsdj: FAILED
/usr/bin/gsdj500: FAILED
/usr/bin/gslj: FAILED
/usr/bin/gslp: FAILED
/usr/bin/gsnd: FAILED
/usr/bin/ifnames2.13: FAILED
/usr/bin/igawk: FAILED
/usr/bin/install-info: FAILED
/usr/bin/krb5-config: FAILED
/usr/bin/lessfile: FAILED
/usr/bin/lesspipe: FAILED
/usr/bin/lft: FAILED
/usr/bin/lft.db: FAILED
/usr/bin/lftpget: FAILED
/usr/bin/libgcrypt-config: FAILED
/usr/bin/libmcrypt-config: FAILED
/usr/bin/libpng-config: FAILED
/usr/bin/libpng12-config: FAILED
/usr/bin/libtool: FAILED
/usr/bin/libtoolize: FAILED
/usr/bin/libwmf-config: FAILED
/usr/bin/lorder: FAILED
/usr/bin/lsinitramfs: FAILED
/usr/bin/lspgpot: FAILED
/usr/bin/mkfontdir: FAILED
/usr/bin/msql2mysql: FAILED
/usr/bin/mysql_config: FAILED
/usr/bin/mysql_install_db: FAILED
/usr/bin/mysql_secure_installation: FAILED
/usr/bin/mysqlaccess: FAILED
/usr/bin/mysqlbug: FAILED
/usr/bin/ncurses5-config: FAILED
/usr/bin/ncursesw5-config: FAILED
/usr/bin/neqn: FAILED
/usr/bin/net-snmp-config: FAILED
/usr/bin/nroff: FAILED
/usr/bin/on_ac_power: FAILED
/usr/bin/pamstretch-gen: FAILED
/usr/bin/pcre-config: FAILED
/usr/bin/pdf2dsc: FAILED
/usr/bin/pdf2ps: FAILED
/usr/bin/pdfopt: FAILED
/usr/bin/perldoc: FAILED
/usr/bin/pf2afm: FAILED
/usr/bin/pfbtopfa: FAILED
/usr/bin/pnminterp-gen: FAILED
/usr/bin/pnmmargin: FAILED
/usr/bin/po2debconf: FAILED
/usr/bin/pphs: FAILED
/usr/bin/ppmtomap: FAILED
/usr/bin/printafm: FAILED
/usr/bin/ps2ascii: FAILED
/usr/bin/ps2epsi: FAILED
/usr/bin/ps2pdf: FAILED
/usr/bin/ps2pdf12: FAILED
/usr/bin/ps2pdf13: FAILED
/usr/bin/ps2pdf14: FAILED
/usr/bin/ps2pdfwr: FAILED
/usr/bin/ps2ps: FAILED
/usr/bin/ps2ps2: FAILED
/usr/bin/ps2txt: FAILED
/usr/bin/rgrep: FAILED
/usr/bin/routef: FAILED
/usr/bin/routel: FAILED
/usr/bin/savelog: FAILED
/usr/bin/sensible-browser: FAILED
/usr/bin/sensible-editor: FAILED
/usr/bin/sensible-pager: FAILED
/usr/bin/sftp-kill: FAILED
/usr/bin/sftp-user: FAILED
/usr/bin/shtool: FAILED
/usr/bin/shtoolize: FAILED
/usr/bin/smbtar: FAILED
/usr/bin/ssh-argv0: FAILED
/usr/bin/ssh-copy-id: FAILED
/usr/bin/ssl-cert-check: FAILED
/usr/bin/traceproto: FAILED
/usr/bin/traceproto.db: FAILED
/usr/bin/traceroute-nanog: FAILED
/usr/bin/update-mime-database: FAILED
/usr/bin/updatedb: FAILED
/usr/bin/updatedb.findutils: FAILED
/usr/bin/valgrind: FAILED
/usr/bin/vimtutor: FAILED
/usr/bin/wftopfa: FAILED
/usr/bin/which: FAILED
/usr/bin/x-www-browser: FAILED
/usr/bin/xdg-desktop-icon: FAILED
/usr/bin/xdg-desktop-menu: FAILED
/usr/bin/xdg-email: FAILED
/usr/bin/xdg-icon-resource: FAILED
/usr/bin/xdg-mime: FAILED
/usr/bin/xdg-open: FAILED
/usr/bin/xdg-screensaver: FAILED
/usr/bin/xdg-settings: FAILED
/usr/bin/xlsview: FAILED
/usr/bin/xml2-config: FAILED
/usr/bin/xpdf: FAILED
/usr/bin/xslt-config: FAILED
/usr/bin/yacc: FAILED
/usr/bin/zipgrep: FAILED
/usr/bin/zxpdf: FAILED
/usr/sbin/add-shell: FAILED
/usr/sbin/csf: FAILED
/usr/sbin/invoke-rc.d: FAILED
/usr/sbin/locale-gen: FAILED
/usr/sbin/mkinitramfs: FAILED
/usr/sbin/ntpdate-debian: FAILED
/usr/sbin/paperconfig: FAILED
/usr/sbin/remove-shell: FAILED
/usr/sbin/service: FAILED
/usr/sbin/sync-available: FAILED
/usr/sbin/t1libconfig: FAILED
/usr/sbin/tcptraceroute: FAILED
/usr/sbin/tcptraceroute.db: FAILED
/usr/sbin/tzconfig: FAILED
/usr/sbin/update-ca-certificates: FAILED
/usr/sbin/update-fonts-alias: FAILED
/usr/sbin/update-fonts-dir: FAILED
/usr/sbin/update-fonts-scale: FAILED
/usr/sbin/update-gsfontmap: FAILED
/usr/sbin/update-icon-caches: FAILED
/usr/sbin/update-icon-caches.gtk2: FAILED
/usr/sbin/update-initramfs: FAILED
/bin/bzcmp: FAILED
/bin/bzdiff: FAILED
/bin/bzegrep: FAILED
/bin/bzexe: FAILED
/bin/bzfgrep: FAILED
/bin/bzgrep: FAILED
/bin/bzless: FAILED
/bin/bzmore: FAILED
/bin/lessfile: FAILED
/bin/lesspipe: FAILED
/bin/sh: FAILED
/bin/which: FAILED
/sbin/fsck.nfs: FAILED
/sbin/initctl: FAILED
/sbin/installkernel: FAILED
/sbin/on_ac_power: FAILED
/sbin/resolvconf: FAILED
/sbin/shadowconfig: FAILED
/usr/local/bin/barracuda: FAILED
/usr/local/bin/boa: FAILED
/usr/local/bin/octopus: FAILED
/usr/local/bin/syncpass: FAILED
/usr/local/bin/tuning-primer.sh: FAILED
/etc/init.d/README: FAILED
/etc/init.d/atd: FAILED
/etc/init.d/auditd: FAILED
/etc/init.d/bootlogd: FAILED
/etc/init.d/bootlogs: FAILED
/etc/init.d/bootmisc.sh: FAILED
/etc/init.d/checkfs.sh: FAILED
/etc/init.d/checkroot-bootclean.sh: FAILED
/etc/init.d/checkroot.sh: FAILED
/etc/init.d/chrony: FAILED
/etc/init.d/cron: FAILED
/etc/init.d/dbus: FAILED
/etc/init.d/fancontrol: FAILED
/etc/init.d/halt: FAILED
/etc/init.d/hdparm: FAILED
/etc/init.d/hostname.sh: FAILED
/etc/init.d/hwclock.sh: FAILED
/etc/init.d/ipvsadm: FAILED
/etc/init.d/killprocs: FAILED
/etc/init.d/kmod: FAILED
/etc/init.d/lm-sensors: FAILED
/etc/init.d/lvm2: FAILED
/etc/init.d/motd: FAILED
/etc/init.d/mountall-bootclean.sh: FAILED
/etc/init.d/mountall.sh: FAILED
/etc/init.d/mountdevsubfs.sh: FAILED
/etc/init.d/mountkernfs.sh: FAILED
/etc/init.d/mountnfs-bootclean.sh: FAILED
/etc/init.d/mountnfs.sh: FAILED
/etc/init.d/mtab.sh: FAILED
/etc/init.d/networking: FAILED
/etc/init.d/nginx: FAILED
/etc/init.d/ntp: FAILED
/etc/init.d/pdnsd: FAILED
/etc/init.d/php5-fpm: FAILED
/etc/init.d/php53-fpm: FAILED
/etc/init.d/postfix: FAILED
/etc/init.d/procps: FAILED
/etc/init.d/rc: FAILED
/etc/init.d/rc.local: FAILED
/etc/init.d/rcS: FAILED
/etc/init.d/reboot: FAILED
/etc/init.d/redis-server: FAILED
/etc/init.d/resolvconf: FAILED
/etc/init.d/rmnologin: FAILED
/etc/init.d/rsync: FAILED
/etc/init.d/rsyslog: FAILED
/etc/init.d/saned: FAILED
/etc/init.d/screen-cleanup: FAILED
/etc/init.d/sendsigs: FAILED
/etc/init.d/single: FAILED
/etc/init.d/skeleton: FAILED
/etc/init.d/ssh: FAILED
/etc/init.d/stop-bootlogd: FAILED
/etc/init.d/stop-bootlogd-single: FAILED
/etc/init.d/sudo: FAILED
/etc/init.d/sysstat: FAILED
/etc/init.d/udev: FAILED
/etc/init.d/udev-mtab: FAILED
/etc/init.d/umountfs: FAILED
/etc/init.d/umountnfs.sh: FAILED
/etc/init.d/umountroot: FAILED
/etc/init.d/unattended-upgrades: FAILED
/etc/init.d/urandom: FAILED
/etc/init.d/vnstat: FAILED
/etc/init.d/x11-common: FAILED

Why all the above files were changed should be investigated.

The upgrade also removed the Gandi.net SSL certs and replaced it with self signed ones:

www.transitionnetwork.org uses an invalid security certificate.
The certificate is not trusted because it is self-signed.
The certificate is only valid for *.puffin.webarch.net

So following the steps from ticket:466#comment:25 to fix this:

cd /etc/ssl/private/
mv nginx-wild-ssl.crt nginx-wild-ssl.crt.old
mv nginx-wild-ssl.key nginx-wild-ssl.key.old
mv pure-ftpd.pem pure-ftpd.pem.old
ln -s ../transitionnetwork.org/transitionnetwork.org.key nginx-wild-ssl.key
ln -s ../transitionnetwork.org/transitionnetwork.org.crt nginx-wild-ssl.crt
ln -s ../transitionnetwork.org/transitionnetwork.org.pem pure-ftpd.pem
/etc/init.d/nginx restart
  Stopping Nginx Server...:.
  Starting Nginx Server...:nginx: [emerg] SSL_CTX_use_PrivateKey_file("/etc/ssl/private/nginx-wild-ssl.key") failed (SSL: error:0B080074:x509 certificate routines:X509_check_private_key:key values mismatch)
rm nginx-wild-ssl.crt
ln -s ../transitionnetwork.org/transitionnetwork.org.chained.pem nginx-wild-ssl.crt
/etc/init.d/nginx start
  Starting Nginx Server...: failed!
/etc/init.d/nginx status
  Nginx Server... found running with processes: 16141 16140 16139 16138 16137 16136 16135 16134 16133 16132 16131 16129 16127 16125 16124 16122 16120 16119 16117 16116 16114 16108 16105 16103 16102 16101 16099 16098 16096 16095 16093 ... (warning).

We still have the wrong cert.

/etc/init.d/nginx stop
ps -lA | grep -i nginx
  1 S     0 17720     1  1  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17722 17720  1  80   0 - 18620 -      ?        00:00:00 nginx
  5 S    33 17723 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17725 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17726 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17728 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17729 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17730 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17732 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17734 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17739 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17742 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17743 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17745 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17747 17720  0  80   0 - 18654 -      ?        00:00:00 nginx
  5 S    33 17748 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17750 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17751 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17753 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17755 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17756 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17758 17720  0  80   0 - 18622 -      ?        00:00:00 nginx
  5 S    33 17759 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17760 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17761 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17762 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17763 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17764 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17765 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17766 17720  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 17767 17720  0  80   0 - 18559 -      ?        00:00:00 nginx

So basically the BOA self rolled nginx doesn't have working init scripts?!

killall -9 nginx
ps -lA | grep -i nginx
  5 S     0 18335     1  1  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18337 18335  0  80   0 - 18622 -      ?        00:00:00 nginx
  5 S    33 18339 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18340 18335  1  80   0 - 18635 -      ?        00:00:00 nginx
  5 S    33 18341 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18343 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18344 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18346 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18348 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18354 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18356 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18358 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18359 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18361 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18363 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18365 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18367 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18368 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18370 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18372 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18373 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18374 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18375 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18376 18335  1  80   0 - 18635 -      ?        00:00:00 nginx
  5 S    33 18377 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18378 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18379 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18380 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18381 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18382 18335  0  80   0 - 18559 -      ?        00:00:00 nginx
  5 S    33 18383 18335  0  80   0 - 18559 -      ?        00:00:00 nginx

I'm going to reboot the server, this is also needed just in case some stuff is still running since the OpenSSL update.

reboot

  The system is going down for reboot NOW!ch.net (pts/0) (Fri Apr 11 22:45:05 2

uptime

 22:45:25 up 79 days, 21:33,  2 users,  load average: 10.95, 2.63, 1.28

uptime

 22:47:08 up 79 days, 21:35,  1 user,  load average: 45.11, 17.28, 6.68

uptime

 22:47:53 up 79 days, 21:36,  1 user,  load average: 42.35, 20.55, 8.29

uptime

 22:48:08 up 79 days, 21:36,  1 user,  load average: 41.35, 21.41, 8.77

uptime

 22:48:37 up 79 days, 21:37,  1 user,  load average: 38.32, 22.59, 9.56

uptime

 22:50:11 up 79 days, 21:38,  1 user,  load average: 33.89, 25.46, 11.86

Wow, that took ages...

Looking at the console from xen it was down to the firewall -- there is such a huge number of iptables rules generated by csf/ldf that it takes 5 mins to unload or load them, it seems.

Another email about the things that have been updated:

From: root@puffin.webarch.net
Date: Fri, 11 Apr 2014 22:33:42 +0100 (BST)
To: chris@webarchitects.co.uk
Subject: lfd on puffin.webarch.net: System Integrity checking detected a modified system file

Time:     Fri Apr 11 22:33:42 2014 +0100

The following list of files have FAILED the md5sum comparison test. This means that the file has been changed in some way. This could be a result of
+an OS update or application upgrade. If the change is unexpected it should be investigated:

/usr/bin/drush: FAILED
/usr/bin/drush4: FAILED
/usr/bin/drush5: FAILED open or read
/usr/bin/drush6: FAILED
/usr/bin/MySecureShell: FAILED
/usr/bin/nginx: FAILED
/usr/bin/php-cli: FAILED
/usr/bin/redis-benchmark: FAILED
/usr/bin/redis-check-aof: FAILED
/usr/bin/redis-check-dump: FAILED
/usr/bin/redis-cli: FAILED
/usr/bin/redis-server: FAILED
/usr/bin/sftp-admin: FAILED
/usr/bin/sftp-state: FAILED
/usr/bin/sftp-who: FAILED
/usr/bin/vnstat: FAILED
/usr/sbin/nginx: FAILED
/usr/sbin/nginx.old: FAILED
/usr/sbin/vnstatd: FAILED
/bin/sh: FAILED
/usr/local/bin/php: FAILED open or read
/usr/local/bin/redis-benchmark: FAILED open or read
/usr/local/bin/redis-check-aof: FAILED open or read
/usr/local/bin/redis-check-dump: FAILED open or read
/usr/local/bin/redis-cli: FAILED open or read
/usr/local/bin/redis-server: FAILED open or read
/etc/init.d/clean-boa-env: FAILED
/etc/init.d/nginx: FAILED
/etc/init.d/php53-fpm: FAILED
/etc/init.d/redis-server: FAILED

It's back up:

uptime

  22:57:57 up 7 min,  1 user,  load average: 1.70, 0.59, 0.24

We still have the self signed cert.

Now to try grepping to work out which nginx files contains the cert path.

cd /etc/nginx
   grep -r ssl .
  ./nginx.conf.default:    #    listen       443 ssl;
  ./nginx.conf.default:    #    ssl_certificate      cert.pem;
  ./nginx.conf.default:    #    ssl_certificate_key  cert.key;
  ./nginx.conf.default:    #    ssl_session_cache    shared:SSL:1m;
  ./nginx.conf.default:    #    ssl_session_timeout  5m;
  ./nginx.conf.default:    #    ssl_ciphers  HIGH:!aNULL:!MD5;
  ./nginx.conf.default:    #    ssl_prefer_server_ciphers  on;
  ./sites-available/default.dpkg-dist:#   ssl on;
  ./sites-available/default.dpkg-dist:#   ssl_certificate cert.pem;
  ./sites-available/default.dpkg-dist:#   ssl_certificate_key cert.key;
  ./sites-available/default.dpkg-dist:#   ssl_session_timeout 5m;
  ./sites-available/default.dpkg-dist:#   ssl_protocols SSLv3 TLSv1;
  ./sites-available/default.dpkg-dist:#   ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv3:+EXP;
  ./sites-available/default.dpkg-dist:#   ssl_prefer_server_ciphers on;

So it's none of the files in /etc/nginx so it must be included from somewhere else:

grep -r include *
  nginx.conf:  include /etc/nginx/mime.types;
  nginx.conf:  include /etc/nginx/conf.d/*.conf;
  nginx.conf:  include /etc/nginx/sites-enabled/*;
  nginx.conf.default:    include       mime.types;
  nginx.conf.default:        #    include        fastcgi_params;
  sites-available/default.dpkg-dist:              # include /etc/nginx/naxsi.rules
  sites-available/default.dpkg-dist:      #       include fastcgi_params;

So:

grep -r ssl /etc/nginx/conf.d/*
  ssl_session_cache   shared:SSL:10m;
  ssl_session_timeout            10m;
grep -ri ssl /etc/nginx/sites-enabled/*
  grep: /etc/nginx/sites-enabled/*: No such file or directory

So perhaps this isn't the ngnix config at all? WHERE THE FUCK IS IT?

updatedb
locate *.crt

  /data/disk/tn/config/server_master/ssl.d/transitionnetwork.org/openssl.crt
  /data/disk/tn/config/ssl.d/transitionnetwork.org/bak/openssl.crt
  /data/disk/tn/config/ssl.d/transitionnetwork.org/openssl.crt
 

Perhaps it's these...

cd /data/disk/tn/config/ssl.d/transitionnetwork.org/
ls -lah
   openssl.crt -> /etc/ssl/transitionnetwork.org/transitionnetwork.org.chained.pem
   openssl.key -> /etc/ssl/transitionnetwork.org/transitionnetwork.org.key

Nope...

These look like the possible nginx config files:

locate nginx | grep tn | grep -v backup
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/hosting.feature.nginx.inc
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/hosting_nginx.info
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/hosting_nginx.module
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/hosting_nginx.service.inc
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/ssl
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/ssl/hosting.feature.nginx_ssl.inc
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/ssl/hosting_nginx_ssl.info
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/ssl/hosting_nginx_ssl.module
/data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/nginx/ssl/hosting_nginx_ssl.service.inc
/data/disk/tn/aegir/distro/008/profiles/hostmaster/web_server/nginx
/data/disk/tn/aegir/distro/008/profiles/hostmaster/web_server/nginx/ssl
/data/disk/tn/aegir/distro/008/profiles/hostmaster/web_server/nginx/ssl/hosting_nginx_ssl.drush.inc
/data/disk/tn/config/includes/nginx_advanced_include.conf
/data/disk/tn/config/includes/nginx_legacy_include.conf
/data/disk/tn/config/includes/nginx_modern_include.conf
/data/disk/tn/config/includes/nginx_octopus_include.conf
/data/disk/tn/config/includes/nginx_simple_include.conf
/data/disk/tn/config/nginx.conf
/data/disk/tn/config/server_master/nginx
/data/disk/tn/config/server_master/nginx.conf
/data/disk/tn/config/server_master/nginx/platform.d
/data/disk/tn/config/server_master/nginx/post.d
/data/disk/tn/config/server_master/nginx/post.d/nginx_force_include*
/data/disk/tn/config/server_master/nginx/pre.d
/data/disk/tn/config/server_master/nginx/vhost.d
/data/disk/tn/config/server_master/nginx/vhost.d/iirs-test.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/news.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/pb-stage-20130212.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/pb-stage-20140403.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/space.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/stg2.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/stg3.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/stg4.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/stg.transitionnetwork.org
/data/disk/tn/config/server_master/nginx/vhost.d/tn.puffin.webarch.net
/data/disk/tn/config/server_master/nginx/vhost.d/www.transitionnetwork.org
/data/disk/tn/config/tn.nginx.conf
/data/disk/tn/.drush/provision_cdn/Provision/Service/cdn/nginx.php
/data/disk/tn/.drush/provision/http/nginx
/data/disk/tn/.drush/provision/http/nginx/nginx_service.inc
/data/disk/tn/.drush/provision/http/nginx_ssl
/data/disk/tn/.drush/provision/http/nginx_ssl/nginx_ssl_service.inc
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx.conf
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx_legacy_include.conf
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx_modern_include.conf
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx_octopus_include.conf
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx.php
/data/disk/tn/.drush/provision/http/Provision/Service/http/nginx/ssl.php
/data/disk/tn/static/transition-network-d6-p009/sites/news.transitionnetwork.org/nginx_cache_hour.info
/data/disk/tn/static/transition-network-d6-p009/sites/www.transitionnetwork.org/nginx_cache_quarter.info
/data/disk/tn/static/transition-network-d6-s008/sites/pb-stage-20130212.transitionnetwork.org/nginx_cache_quarter.info
/data/disk/tn/static/transition-network-d6-s008/sites/stg2.transitionnetwork.org/nginx_cache_quarter.info
/data/disk/tn/static/transition-network-d6-s008/sites/stg.transitionnetwork.org/nginx_cache_quarter.info
/data/disk/tn/static/transition-network-d6-s011/sites/pb-stage-20140403.transitionnetwork.org/nginx_cache_quarter.info
/var/aegir/config/server_master/nginx/platform.d/tn.conf

So, checking these places:

grep -ri ssl /data/disk/tn/aegir/distro/008/profiles/hostmaster/modules/hosting/web_server/* | grep crt
grep -ri ssl /data/disk/tn/aegir/distro/008/profiles/hostmaster/web_server/* | grep crt
grep -ri ssl /data/disk/tn/config/tn.nginx.conf 
grep -ri ssl /data/disk/tn/config/includes/*
grep -ri ssl /data/disk/tn/config/nginx.conf
grep -ri ssl /data/disk/tn/config/server_master/*
grep -ri ssl /data/disk/tn/config/tn.nginx.conf
grep -ri ssl /data/disk/tn/.drush/provision/http/nginx
grep -ri ssl /data/disk/tn/.drush/provision/http/Provision/Service/http/*
grep -ri ssl /data/disk/tn/static/transition-network-d6-p009/sites/* | grep crt
grep -ri ssl /var/aegir/config/server_master/nginx/platform.d/tn.conf

No joy...

date

  Fri Apr 11 23:24:04 BST 2014

I just don't have a clue where the config files that need fixing are, this is very fustrating, the site is *down*.

Last edited 3 years ago by chris (previous) (diff)

comment:21 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.26
  • Total Hours changed from 3.84 to 4.1

Starting from the beginning...

In /etc/init.d/nginx we have:

NGINX_CONF_FILE="/etc/nginx/nginx.conf"

That files includes:

  include /etc/nginx/mime.types;
  include /etc/nginx/conf.d/*.conf;
  include /etc/nginx/sites-enabled/*;
ls /etc/nginx/conf.d/*.conf
  /etc/nginx/conf.d/aegir.conf@
grep ssl /etc/nginx/conf.d/aegir.conf
  ssl_session_cache   shared:SSL:10m;
  ssl_session_timeout            10m;
grep include /etc/nginx/conf.d/aegir.conf
  include /var/aegir/config/server_master/nginx/pre.d/*;
  include /var/aegir/config/server_master/nginx/platform.d/*;
  include /var/aegir/config/server_master/nginx/vhost.d/*;
  include /var/aegir/config/server_master/nginx/post.d/*;
grep -ir ssl /var/aegir/config/server_master/nginx/pre.d/*
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:### /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  listen                       *:443 ssl spdy;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl                          on;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_certificate              /etc/ssl/private/nginx-wild-ssl.crt;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_certificate_key          /etc/ssl/private/nginx-wild-ssl.key;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_session_timeout          5m;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_protocols SSLv3 TLSv1 TLSv1.1 TLSv1.2;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_ciphers EECDH+ECDSA+AESGCM:EECDH+aRSA+AESGCM:EECDH+ECDSA+SHA384:EECDH+ECDSA+SHA256:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH+aRSA+RC4:EECDH:EDH+aRSA:RC4:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!SRP:!DSS:+RC4:RC4;
  /var/aegir/config/server_master/nginx/pre.d/nginx_wild_ssl.conf:  ssl_prefer_server_ciphers    on;

BINGO!

That file was edited:

  #ssl_certificate              /etc/ssl/private/nginx-wild-ssl.crt;
  #ssl_certificate_key          /etc/ssl/private/nginx-wild-ssl.key;
  ssl_certificate              /etc/ssl/transitionnetwork.org/transitionnetwork.org.chained.pem;
  ssl_certificate_key          /etc/ssl/transitionnetwork.org/transitionnetwork.org.key;

But still:

www.transitionnetwork.org uses an invalid security certificate.

The certificate is not trusted because it is self-signed.

The certificate is only valid for *.puffin.webarch.net

So, copying the files over from wiki:PenguinServer again:

rsync -av penguin:tn/ /root/tn/
bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
receiving incremental file list
./
transitionnetwork.org.chained.pem
transitionnetwork.org.crt
transitionnetwork.org.csr
transitionnetwork.org.key

sent 90 bytes  received 9797 bytes  19774.00 bytes/sec
total size is 9499  speedup is 0.96

And:

cd /etc/ssl/transitionnetwork.org
mv transitionnetwork.org.* old/
mv /root/tn/transitionnetwork.org.* .

And it's fixed!

So the issue was that the right certs were replaced by self signed by BOA...?

What a wast of time that was.

Last edited 3 years ago by chris (previous) (diff)

comment:22 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 4.1 to 4.3

So now testing other stuff and looking around...

As expect the default MySQL settings have dramatically reduced the RAM available for the database:

These graphs and lots others have been broken:

But I'm tired and it can wait till tomorrow -- it looks like a permissions issue:

munin-run phpfpm_average 
  php_average.value /etc/munin/plugins/phpfpm_average: line 40: /bin/ps: Permission denied
  /etc/munin/plugins/phpfpm_average: line 40: /bin/grep: Permission denied
  /etc/munin/plugins/phpfpm_average: line 40: /bin/grep: Permission denied
  /etc/munin/plugins/phpfpm_average: line 40: /bin/grep: Permission denied
  /etc/munin/plugins/phpfpm_average: line 40: /usr/bin/awk: Permission denied
munin-run phpfpm_connections 
  Can't exec "/etc/munin/plugins/phpfpm_connections": Permission denied at /usr/share/perl5/Munin/Node/Service.pm line 263.
  # FATAL: Failed to exec.

munin-run multips_memory 
  /usr/share/munin/plugins/plugin.sh: line 14: /bin/sed: Permission denied
  /etc/munin/plugins/multips_memory: line 140: /bin/ps: Permission denied
  /etc/munin/plugins/multips_memory: line 144: /usr/bin/gawk: Permission denied
  /usr/share/munin/plugins/plugin.sh: line 14: /bin/sed: Permission denied
  /etc/munin/plugins/multips_memory: line 140: /bin/ps: Permission denied
  /etc/munin/plugins/multips_memory: line 144: /usr/bin/gawk: Permission denied
  /usr/share/munin/plugins/plugin.sh: line 14: /bin/sed: Permission denied
  /etc/munin/plugins/multips_memory: line 140: /bin/ps: Permission denied
  /etc/munin/plugins/multips_memory: line 144: /usr/bin/gawk: Permission denied
  /usr/share/munin/plugins/plugin.sh: line 14: /bin/sed: Permission denied
  /etc/munin/plugins/multips_memory: line 140: /bin/ps: Permission denied
  /etc/munin/plugins/multips_memory: line 144: /usr/bin/gawk: Permission denied
  /usr/share/munin/plugins/plugin.sh: line 14: /bin/sed: Permission denied
  /etc/munin/plugins/multips_memory: line 140: /bin/ps: Permission denied
  /etc/munin/plugins/multips_memory: line 144: /usr/bin/gawk: Permission denied

More broken shit than a BOA upgrade usually causes...

Last edited 3 years ago by chris (previous) (diff)

comment:23 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 4.3 to 4.8

It looks like wiki:PuffinServer commited suicide, I got this email:

Date: Sat, 12 Apr 2014 09:11:03 +0100
To: chris@webarchitects.co.uk
Subject: ** PROBLEM Service Alert: puffin/SSH is CRITICAL **

***** Nagios *****

Notification Type: PROBLEM

Service: SSH
Host: puffin
Address: puffin.webarch.net
State: CRITICAL

Date/Time: Sat Apr 12 09:11:03 BST 2014

I couldn't connect via ssh and was about to reboot it at a xen level when I did get in and it looks that with the default BOA settings we are back in load spike suicide land:

uptime
 09:24:23 up 10:34,  1 user,  load average: 65.71, 120.18, 85.84
uptime
 09:29:37 up 10:39,  1 user,  load average: 0.52, 42.71, 61.54

I'll look at the logs in a while to see what happened, but the BOA default is to clobber lots of key logs so I might not find a lot of info.

Since the aim is run with the BOA defaults, ticket:670, I'll start by doing the minimum needed to get the munin graphs working again and stop the log clobbering so we can get a better picture about what is happening when the server commits suicide again.

Last edited 3 years ago by chris (previous) (diff)

comment:24 in reply to: ↑ 20 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 4.8 to 5.05

Replying to chris:

Wow, that took ages...

Looking at the console from xen it was down to the firewall -- there is such a huge number of iptables rules generated by csf/ldf that it takes 5 mins to unload or load them, it seems.

Last night I set a iptables --list running in screen, the file it generated:

ls /root/iptables.2014-04-12 -lah
  -rw-r--r-- 1 root root 247K Apr 12 00:28 /root/iptables.2014-04-12
cat /root/iptables.2014-04-12 | wc -l
  3693

I was expecting it to be bigger.

Last edited 3 years ago by chris (previous) (diff)

comment:25 Changed 3 years ago by chris

Posting this to record 15 mins spent rereading comments and fixing typos and spelling mistakes

comment:26 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 5.05 to 5.3

Oops, time missed off last comment.

comment:27 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.3
  • Total Hours changed from 5.3 to 5.6

Since users are no longer allowed access to most command line functioms because BOA chmodded lits of programes I think adjusting munin tasks that were run by the munin user to now run as root is probable the easiest was to address this, however this could also have negative security implications.

Testing with two graphs to negin with,

[multips]
env.names nginx php_fpm mysqld redis-server munin-node
user root

[multips_memory]
env.names nginx php-fpm mysqld redis-server munin-node
user root

This should fix this graph:

As it it working again on the command line:

root@puffin:/etc/munin/plugins# munin-run multips_memory
nginx.value 631918592
php_fpm.value 76709888
mysqld.value 1502449664
redis_server.value U
munin_node.value 10309632

comment:28 follow-up: ↓ 32 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.75
  • Total Hours changed from 5.6 to 6.35

Fixing the Munin plugins by making them run as root rather than user munin or nobody or other non-root users with less permissions... *sigh*

These ones are not just a matter of a perms fix:

munin-run nginx_request
  request.value U
munin-run nginx_status
  total.value U
  reading.value U
  writing.value U
  waiting.value U
munin-run phpfpm_connections
  accepted.value U
munin-run phpfpm_connections
  accepted.value U
munin-run phpfpm_status
  idle.value U
  active.value U
  total.value U
munin-run redis_127.0.0.1_6379
  Could not connect to Redis at 127.0.0.1:6379: Connection refused
  multigraph redis_commands
  commands.value 
  hits.value 
  misses.value 
  multigraph redis_dbs
  expires.value 

For everything else the graphs are starting to be drawn again:

comment:29 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.52
  • Total Hours changed from 6.35 to 6.87

There was another load spike suicide this afternoon that's two in the 24 hours since the upgrade to BOA 2.2.2, I'll loot al the lofgs later and record my findings on ticket:670.

Fixing the broken Munin graphs...

The /var/aegir/config/server_master/nginx.conf files now contains:

server {
  listen       *:80;
  server_name 127.0.0.1;
  location /nginx_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;
    deny all;
  }
}

So trying to work out the URL to get the stats...

lynx -dump http://localhost/nginx_status
                                404 Not Found
     __________________________________________________________________

                                    nginx

lynx -dump http://puffin.webarch.net/nginx_status
                                404 Not Found
     __________________________________________________________________

                                    nginx

lynx -dump http://127.0.0.1/nginx_status
Active connections: 11
server accepts handled requests
 9354 9354 13400
Reading: 0 Writing: 1 Waiting: 10

So /etc/munin/plugin-conf.d/munin-node was updated to:

[nginx_request]
env.url http://127.0.0.1/nginx_status
user root

[nginx_status]
env.url http://127.0.0.1/nginx_status
user root

And testing:

nginx_status 
  total.value 23
  reading.value 0
  writing.value 1
  waiting.value 21
munin-run nginx_request 
  request.value 13915

The docs at wiki:PuffinServer#nginxconfigchanges will need updating, we once again have Munin Nginx graphs:

Version 0, edited 3 years ago by chris (next)

comment:30 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.75
  • Total Hours changed from 6.87 to 7.62

The old php config file, /opt/local/etc/php53-fpm.conf still contains:

pm.status_path = /status
ping.path = /ping

However that status isn't available at these URLs:

lynx -dump http://127.0.0.1/status
                                404 Not Found
     __________________________________________________________________

                                    nginx
lynx -dump http://localhost/status
                                404 Not Found
     __________________________________________________________________

                                    nginx
lynx -dump http://puffin.webarch.net/status
                                404 Not Found
     __________________________________________________________________

                                    nginx

Also the new URLs which are supposed to work don't:

lynx -dump http://127.0.0.1/fpm-status
                                404 Not Found
     __________________________________________________________________

                                    nginx
lynx -dump http://localhost/fpm-status
                                404 Not Found
     __________________________________________________________________

                                    nginx
lynx -dump http://puffin.webarch.net/fpm-status
                                404 Not Found
     __________________________________________________________________

                                    nginx

So we need to find the new php config file to see if the status is enabled.

It's might be one of these files:

updatedb
locate php | grep conf$ 
/etc/php5/fpm/php-fpm.conf
/etc/php5/fpm/pool.d/www.conf
/opt/etc/php-fpm.conf
/opt/local/etc/php53-fpm.conf
/opt/php52/etc/php52-fpm.conf
/opt/php53/etc/pear.conf
/opt/php53/etc/php53-fpm.conf
/opt/php53/etc/pool.d/www53.conf
/opt/php54/etc/php54-fpm.conf
/opt/php54/etc/pool.d/www54.conf
/opt/php55/etc/php55-fpm.conf
/opt/php55/etc/pool.d/www55.conf

The only one with a status line is /opt/local/etc/php53-fpm.conf

So trying to track down the php-fpm config file which is actually being used...

The /opt/php53/etc/php53-fpm.conf file includes /opt/php53/etc/pool.d/*.conf and /opt/php53/etc/pool.d/www53.conf includes /opt/etc/fpm/fpm-pool-common.conf and that files contains:

pm.status_path = /fpm-status
ping.path = /fpm-ping

Looking at /etc/init.d/php53-fpm and /etc/init.d/php5-fpm to try to work out where the php-fpm config files are to be found...

/etc/init.d/php53-fpm contains:

php_fpm_CONF=/opt/php53/etc/php53-fpm.conf

And /etc/init.d/php5-fpm contains:

DAEMON_ARGS="--fpm-config /etc/php5/fpm/php-fpm.conf"

Before the last upgrade the init script was /etc/init.d/php53-fpm, see wiki:PuffinServer#php-fpm

The problem is with Nginx, I tried editing /var/aegir/config/server_master/nginx.conf to add the code we had before:

  location ~ ^/(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    deny all;
  }

But that didn't fix it. I think I'm too tired to solve this mystery tonight.

Looking in the logs, we have lots of entries like this in /var/log/php/error_log_53

[12-Apr-2014 18:00:35 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:35 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:35 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:41 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:41 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:41 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:42 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:00:42 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0
[12-Apr-2014 18:07:47 UTC] PHP Warning:  Zend OPcache can't be temporary enabled (it may be only disabled till the end of request) in Unknown on line 0

In /var/log/php/fpm-www53-slow.log there are lots of entries like this, the time matches the last load spike suicide:

[12-Apr-2014 15:20:57]  [pool www53] pid 56669
script_filename = /data/disk/tn/static/transition-network-d6-p009/index.php
[0x00007ff0205f72b0] _drupal_bootstrap() /data/disk/tn/static/transition-network-d6-p009/includes/bootstrap.inc:1480
[0x00007ff0205f7150] _drupal_bootstrap() /data/disk/tn/static/transition-network-d6-p009/includes/bootstrap.inc:1447
[0x00007ff0205f7050] drupal_bootstrap() /data/disk/tn/static/transition-network-d6-p009/index.php:15

[12-Apr-2014 15:21:13]  [pool www53] pid 56670
script_filename = /data/disk/tn/static/transition-network-d6-p009/index.php
[0x00007ff0205f7760] is_readable() /data/conf/global.inc:476
[0x00007ff0205f7628] +++ dump failed

[12-Apr-2014 15:21:18]  [pool www53] pid 56681
script_filename = /data/disk/tn/static/transition-network-d6-p009/index.php
[0x00007ff0205f7760] connect() /data/conf/global.inc:371
[0x00007ff0205f7628] +++ dump failed

[12-Apr-2014 15:21:23]  [pool www53] pid 56547
script_filename = /data/disk/tn/static/transition-network-d6-p009/index.php
[0x000000000348da30] is_readable() /data/conf/global.inc:427
[0x000000000348d8f8] +++ dump failed

[12-Apr-2014 15:21:54]  [pool www53] pid 56695
script_filename = /data/disk/tn/static/transition-network-d6-p009/index.php
[0x00007ff0205f7760] connect() /data/conf/global.inc:371
[0x00007ff0205f7628] +++ dump failed

And in /var/log/php/php53-fpm-error.log there are lots of lines like this which coincide with the last load spike suicide:

[12-Apr-2014 15:21:54] ERROR: failed to ptrace(PEEKDATA) pid 56695: Input/output error (5)
[12-Apr-2014 15:21:58] WARNING: [pool www53] child 56655, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (194.782226 sec), terminating
[12-Apr-2014 15:21:58] WARNING: [pool www53] child 56654, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (197.135421 sec), terminating
[12-Apr-2014 15:21:58] WARNING: [pool www53] child 56653, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (199.566956 sec), terminating
[12-Apr-2014 15:22:01] WARNING: [pool www53] child 56655 exited on signal 15 (SIGTERM) after 201.299257 seconds from start
[12-Apr-2014 15:22:04] WARNING: [pool www53] child 56653 exited on signal 15 (SIGTERM) after 209.593928 seconds from start
[12-Apr-2014 15:22:10] WARNING: [pool www53] child 56654 exited on signal 15 (SIGTERM) after 211.753491 seconds from start
[12-Apr-2014 15:22:18] WARNING: [pool www53] child 56661, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (193.133020 sec), terminating
[12-Apr-2014 15:22:18] WARNING: [pool www53] child 56657, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (189.783626 sec), terminating
[12-Apr-2014 15:22:18] WARNING: [pool www53] child 56590, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (187.447955 sec), terminating
[12-Apr-2014 15:22:22] WARNING: [pool www53] child 56657 exited on signal 15 (SIGTERM) after 219.567805 seconds from start
[12-Apr-2014 15:22:25] WARNING: [pool www53] child 56590 exited on signal 15 (SIGTERM) after 365.818528 seconds from start
[12-Apr-2014 15:22:28] WARNING: [pool www53] child 56661 exited on signal 15 (SIGTERM) after 208.535602 seconds from start
[12-Apr-2014 15:22:38] WARNING: [pool www53] child 56617, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (196.484809 sec), terminating
[12-Apr-2014 15:22:45] WARNING: [pool www53] child 56617 exited on signal 15 (SIGTERM) after 352.524615 seconds from start
[12-Apr-2014 15:22:58] WARNING: [pool www53] child 56670, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (196.228568 sec), terminating
[12-Apr-2014 15:22:58] WARNING: [pool www53] child 56669, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (198.184145 sec), terminating
[12-Apr-2014 15:23:05] WARNING: [pool www53] child 56669 exited on signal 15 (SIGTERM) after 224.906323 seconds from start
[12-Apr-2014 15:23:08] WARNING: [pool www53] child 56670 exited on signal 15 (SIGTERM) after 227.669041 seconds from start
[12-Apr-2014 15:23:19] WARNING: [pool www53] child 56681, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (192.413551 sec), terminating
[12-Apr-2014 15:23:19] WARNING: [pool www53] child 56547, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "GET /index.php") execution timed out (194.007008 sec), terminating
[12-Apr-2014 15:23:21] WARNING: [pool www53] child 56681 exited on signal 15 (SIGTERM) after 200.979892 seconds from start
[12-Apr-2014 15:23:24] WARNING: [pool www53] child 56547 exited on signal 15 (SIGTERM) after 472.666956 seconds from start
[12-Apr-2014 15:23:39] WARNING: [pool www53] child 56695, script '/data/disk/tn/static/transition-network-d6-p009/index.php' (request: "HEAD /index.php") execution timed out (186.279207 sec), terminating
[12-Apr-2014 15:23:41] WARNING: [pool www53] child 56695 exited on signal 15 (SIGTERM) after 216.840863 seconds from start
[12-Apr-2014 15:27:30] ERROR: unable to bind listening socket for address '127.0.0.1:9090': Address already in use (98)
[12-Apr-2014 15:27:30] ERROR: FPM initialization failed

So it's good that the logs are not being clobbered any more but there does appear to be some things not quite right...

I'll do some more on this tomorrow evening...

comment:31 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.35
  • Total Hours changed from 7.62 to 7.97

Redis isn't running:

ps -lA | grep -i redis

And it won't start:

/etc/init.d/redis-server start
  Starting redis-server: touch: cannot touch `/var/run/redis/redis.pid': No such file or directory

This could explain why the server wasn't coping with load spikes.

Make the directory for the pid file and try to start it:

mkdir /var/run/redis/
chown redis:redis /var/run/redis/
/etc/init.d/redis-server start
  Starting redis-server: failed

The start failed because it had been automatically started by BOA I expect, it is running now:

ps -lA | grep -i redis
  1 S   106 52733     1  0  80   0 - 13575 -      ?        00:00:00 redis-server

The logs is being clobbered:

rotate
[52733] 14 Apr 10:16:33.980 # Server started, Redis version 2.8.8
[52733] 14 Apr 10:16:33.981 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

The Redis config file, /etc/redis/redis.conf now contains a password, so this has been added to /etc/munin/plugin-conf.d/munin-node:

[redis_*]
env.password XXX
user root

It has been tested on the command line:

cd /etc/munin/plugins
munin-run redis_127.0.0.1_6379 
  multigraph redis_clients
  clients.value 1
  multigraph redis_blocked_clients
  blocked.value 0
  multigraph redis_memory
  memory.value 37383008
  multigraph redis_fragmentation
  frag.value 1.09
  multigraph redis_total_connections
  connections.value 883
  multigraph redis_expired_keys
  expired.value 8
  multigraph redis_evicted_keys
  evicted.value 0
  multigraph redis_pubsub_channels
  channels.value 0
  multigraph redis_commands
  commands.value 34345
  hits.value 17193
  misses.value 5496
  multigraph redis_dbs
  db0keys.value 3893
  db0expires.value 919

Munin has been restarted:

/etc/init.d/munin-node restart
  [ ok ] Stopping Munin-Node: done.
  [ ok ] Starting Munin-Node: done.

So now we should soon start to get Redis munin graphs again:

comment:32 in reply to: ↑ 28 ; follow-up: ↓ 33 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 7.97 to 8.97

Replying to chris:

These ones are not just a matter of a perms fix:

munin-run nginx_request
  request.value U

munin-run nginx_status
  total.value U
  reading.value U
  writing.value U
  waiting.value U

The odd thing here is that this works on the command line:

munin-run nginx_request
  request.value 249155
munin-run nginx_status
  total.value 30
  reading.value 0
  writing.value 4
  waiting.value 26

But we don't have graphs here:

There is nothing in the log files, /var/log/munin/, but since opening this comment they have started to reappear -- the munin-node restart done to fix the redis logs must have also fixed these graphs?

These are still now working:

munin-run phpfpm_connections
  accepted.value U
munin-run phpfpm_status
  idle.value U
  active.value U
  total.value U

So, the plugins are written in perl:

cd /etc/munin/plugins
perl -wc phpfpm_connections
  phpfpm_connections syntax OK
perl -wc phpfpm_status
  phpfpm_status syntax OK

The problem is that the status URL is a 404:

lynx -dump http://127.0.0.1/fpm-status
                                404 Not Found
     __________________________________________________________________

                                    nginx
lynx -dump http://127.0.0.1/status
                                404 Not Found
     __________________________________________________________________

                                    nginx

Previously this needed adding to /var/aegir/config/server_master/nginx.conf:

  location ~ ^/(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    deny all;
  }

See wiki:PuffinServer#nginxconfigchanges but when you try connect using those details:

lynx -dump http://127.0.0.1:9090/status
  
  Looking up 127.0.0.1:9090
  Making HTTP connection to 127.0.0.1:9090
  Sending HTTP request.
  HTTP request sent; waiting for response.
  Retrying as HTTP0 request.
  Looking up 127.0.0.1:9090
  Making HTTP connection to 127.0.0.1:9090
  Sending HTTP request.
  HTTP request sent; waiting for response.
  Alert!: Unexpected network read error; connection aborted.
  Can't Access `http://127.0.0.1:9090/status'
  Alert!: Unable to access document.
  
  lynx: Can't access startfile 

This does appear to be the right port, it is set to 9090 in /opt/php53/etc/pool.d/www53.conf:

listen = 127.0.0.1:9090

And that file includes /opt/etc/fpm/fpm-pool-common.conf which contains:

pm.status_path = /fpm-status
ping.path = /fpm-ping

It is running on this port:

netstat -tulpn | grep 9090
  tcp        0      0 127.0.0.1:9090          0.0.0.0:*               LISTEN      6852/php53-fpm.conf

And the binary:

ls -l /proc/6852/exe 
  lrwxrwxrwx 1 root root 0 Apr 14 00:02 /proc/6852/exe -> /opt/php53/sbin/php-fpm*

And that is the binary referenced in /etc/init.d/php53-fpm:

php_fpm_BIN=/opt/php53/sbin/php-fpm
php_fpm_CONF=/opt/php53/etc/php53-fpm.conf

And /opt/php53/etc/php53-fpm.conf includes /opt/php53/etc/pool.d/*.conf and that includes /opt/etc/fpm/fpm-pool-common.conf.

Still non the wiser why we can't get the php-fpm graphs working:

More work is needed on this :-(

comment:33 in reply to: ↑ 32 ; follow-up: ↓ 34 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 8.97 to 9.22

Replying to chris:

Still non the wiser why we can't get the php-fpm graphs working:

Adding this to /var/aegir/config/server_master/nginx.conf:

  location ~ ^/fpm-(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    allow 81.95.52.103;
    deny all;
  }

Has resulted in the Munin graphs to start to be generated again.

However by editing /var/aegir/config/server_master/nginx.conf the "use stock BOA settings where possible" directive, ticket:670, has been breached and this change might need doing after each BOA upgrade.

The documentation, wiki:PuffinServer#nginxconfigchanges has been updated.

comment:34 in reply to: ↑ 33 ; follow-ups: ↓ 35 ↓ 36 Changed 3 years ago by jim

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 9.22 to 9.72

Replying to chris:

Adding this to /var/aegir/config/server_master/nginx.conf:

  location ~ ^/fpm-(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    allow 81.95.52.103;
    deny all;
  }

There was an issue about this here: https://drupal.org/node/2167459

We can open a new one with the extra changes if needed. It's not clear from the above which lines were added, Chris? If you can provide a summary of what needed to be changed, I'd be happy to add a ticket in the Barracuda D.o queue tonight.

(Also adding my time for various comments & emails.)

comment:35 in reply to: ↑ 34 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 9.72 to 9.82

Replying to jim:

Replying to chris:

Adding this to /var/aegir/config/server_master/nginx.conf:

  location ~ ^/fpm-(status|ping)$ {
    fastcgi_pass 127.0.0.1:9090;
    fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
    fastcgi_intercept_errors on;
    include fastcgi_params;
    access_log off;
    allow 127.0.0.1;
    allow 81.95.52.103;
    deny all;
  }

It's not clear from the above which lines were added, Chris?

All the lines above were added.

comment:36 in reply to: ↑ 34 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 9.82 to 10.02

Replying to jim:

There was an issue about this here: https://drupal.org/node/2167459

According the the diffs linked from that changed were made to nginx_modern_include.conf and nginx_octopus_include.conf, these are the copies of these files on the server:

locate nginx_modern_include.conf | grep -v backups | grep -v \.drush
  /data/disk/tn/config/includes/nginx_modern_include.conf
  /var/aegir/config/includes/nginx_modern_include.conf
locate nginx_octopus_include.conf | grep -v backups | grep -v \.drush | grep -v root
  /data/disk/tn/config/includes/nginx_octopus_include.conf
  /var/aegir/config/includes/nginx_octopus_include.conf

These files do have the changes:

  • /var/aegir/config/includes/nginx_modern_include.conf
  • /var/aegir/config/includes/nginx_octopus_include.conf

But these don't:

  • /data/disk/tn/config/includes/nginx_modern_include.conf
  • /data/disk/tn/config/includes/nginx_octopus_include.conf

Should they be manually edited or is there a BOA way to update them?

comment:37 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.2
  • Total Hours changed from 10.02 to 10.22

The two load spike suicides on Saturday didn't trigger any alerts from CSF so this config in /etc/csf/csf.conf:

# Check the PT_LOAD_AVG minute Load Average (can be set to 1 5 or 15 and
# defaults to 5 if set otherwise) on the server every PT_LOAD seconds. If the
# load average is greater than or equal to PT_LOAD_LEVEL then an email alert is
# sent. lfd then does not report subsequent high load until PT_LOAD_SKIP
# seconds has passed to prevent email floods.
#
# Set PT_LOAD to "0" to disable this feature
PT_LOAD = "30"
PT_LOAD_AVG = "5"
PT_LOAD_LEVEL = "6"
PT_LOAD_SKIP = "3600"

Has been updated to:

PT_LOAD = "10"
PT_LOAD_AVG = "1"
PT_LOAD_LEVEL = "3"
PT_LOAD_SKIP = "60"

Also this was changed, though I don't know if it'll work:

#PT_APACHESTATUS = "http://127.0.0.1/server-status"
PT_APACHESTATUS = "http://127.0.0.1/nginx_status"

Restarting:

csf -r
  lfd will restart csf within the next 5 seconds
  *WARNING* PT_LOAD_SKIP sanity check. PT_LOAD_SKIP = 60. Recommended range: 1800-86400 (Default: 3600)

comment:38 Changed 3 years ago by ed

have there been any more issues since Satruday?

Changed 3 years ago by chris

comment:39 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.65
  • Total Hours changed from 10.22 to 10.87

There haven't been any load spike suicides since the two on Saturday, but the load is more spiky:


The problems on Saturday could well have been mostly, or perhaps totally, because to Redis wasn't running as BOA didn't create a directory for it's process ID file. Or perhaps the suicide thresholds are now set at too low a level? I haven't spent the time to read the updated suicide script to work out what is needed to trigger one.

Using the BOA defaults for MySQL has resulted in MySQL having 1/2 the RAM it had before, this has probably contributed to the changed behaviour, I think we should breach the "use stock BOA settings where possible" policy, ticket:670, and make changes to the MySQL settings, see ticket:587#comment:13, the time for that comment has been included in the time for this one.

I still haven't had time to have a close look at the logs from Saturday, but I'm also not sure that it's worth spending any time on this now?

The documentation on wiki:PuffinServer needs quite a lot of updating, once that has been done then this ticket and ticket:670 can probably be closed.

This ticket and ticket:670 are going to end up totalling over 16 hours, this means that this BOA upgrade will have taken twice as longs as the last one, which took 8 hours, see wiki:PuffinServer#Upgradetickets for the totals.

Last edited 3 years ago by chris (previous) (diff)

comment:40 Changed 3 years ago by chris

Very sorry that when doing this update I forgot to run octopus up-stable all, this might be the cause of the cron tasks stopping, see ticket:724#comment:6

There is also another BOA update that is outstanding, ticket:721.

I'll to that update and this time not forget to run octopus up-stable all, after midnight tonight, so it comes out of the May maintenance budget, unless I hear otherwise.

Last edited 3 years ago by chris (previous) (diff)

comment:41 Changed 3 years ago by chris

  • Status changed from new to closed
  • Resolution set to fixed

Closing as it's been superceeded by ticket:721 for the 2.2.3 update.

Note: See TracTickets for help on using tickets.