Ticket #893 (new defect)

Opened 11 months ago

Last modified 8 months ago

BOA Cron Jobs

Reported by: chris Owned by: chris
Priority: major Milestone: Maintenance
Component: Live server Keywords:
Cc: paul, sam., ade, kate Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 2.43

Description

All the BOA cron jobs were stopped on ticket:846#comment:88. This ticket is for looking at them all and deciding which, if any, are needed.

Attachments

puffin-2016-01-03_multips_memory-week.png (37.5 KB) - added by chris 11 months ago.
puffin_2016-01-03_load-month.png (27.5 KB) - added by chris 11 months ago.
puffin_2016-01-03_http_loadtime-month.png (32.5 KB) - added by chris 11 months ago.

Change History

comment:1 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 1.0
  • Total Hours changed from 0.0 to 1.0

I have updated the wiki:PuffinServer?action=diff&version=191&old_version=189 to reflect the current status of PuffinServer and created a new BoaCronJobs wiki page where I have written up a brief description of each BOA script following a quick read of them. I don't think any of them are worth re-enabling, we probably should have stopped them all years ago, BOA itself looks like the cause of the killing sprees and suicides.

However I think the memory allocation for MySQL is worth tweaking:

mysqltuner.pl 

 >>  MySQLTuner 1.2.0 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
[OK] Logged in using credentials from debian maintenance account.

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.47-MariaDB-1~wheezy-log
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB +Federated +InnoDB -ISAM -NDBCluster 
[--] Data in MyISAM tables: 203M (Tables: 3)
[--] Data in InnoDB tables: 933M (Tables: 1279)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[!!] Total fragmented tables: 139

-------- Security Recommendations  -------------------------------------------
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 19h 59m 12s (7M q [99.977 qps], 81K conn, TX: 20B, RX: 988M)
[--] Reads / Writes: 92% / 8%
[--] Total buffers: 2.8G global + 20.4M per thread (50 max threads)
[OK] Maximum possible memory usage: 3.8G (21% of installed RAM)
[OK] Slow queries: 0% (116/7M)
[OK] Highest usage of available connections: 57% (29/50)
[!!] Key buffer size / total MyISAM indexes: 193.0M/214.7M
[!!] Key buffer hit rate: 89.9% (2K cached / 280 reads)
[OK] Query cache efficiency: 45.3% (5M cached / 11M selects)
[!!] Query cache prunes per day: 681699
[OK] Sorts requiring temporary tables: 0% (176 temp sorts / 158K sorts)
[!!] Temporary tables created on disk: 27% (43K on disk / 157K total)
[OK] Thread cache hit rate: 99% (29 created / 81K connections)
[!!] Table cache hit rate: 0% (128 open / 21K opened)
[OK] Open file limit used: 0% (4/196K)
[OK] Table locks acquired immediately: 99% (1M immediate / 1M locks)
[OK] InnoDB data size / buffer pool: 933.8M/1.5G

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    MySQL started within last 24 hours - recommendations may be inaccurate
    Temporary table size is already large - reduce result set size
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    key_buffer_size (> 214.7M)
    query_cache_size (> 128M)
    table_cache (> 128)

So these variables in /etc/mysql/my.cnf were changed:

#key_buffer_size         = 193M
key_buffer_size         = 256M

#query_cache_size        = 128M
query_cache_size        = 512M

#join_buffer_size        = 4M
join_buffer_size        = 12M

#tmpdir                  = /tmp
tmpdir                  = /dev/shm/mysql

And MySQL was restarted.

Note that the chris crontab alread contained:

# create a tmp dir on the ram disk for mysql
# see /trac/ticket/591
@reboot sudo mkdir /run/shm/mysql ; sudo chown mysql:mysql /run/shm/mysql

comment:2 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.33
  • Total Hours changed from 1.0 to 1.33

I have been keeping an eye on the site and the Munin graphs, although this is a very quite time of year for the site I doubt the bots adhere to holidays, the effect of stopping all the BOA root cron jobs had been quote dramatic, I have posted some graphs here wiki:BoaCronJobs#BOACronJobs


comment:3 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.33 to 1.58

I have just made some tweaks to some server settings, doubling the Redis memory in /etc/redis/redis.conf:

#maxmemory 1024MB
maxmemory 2048MB

As it had hit the 1GB limit.

And in /etc/nginx/nginx.conf adjusting the settings to suite the number of CPUs:

#worker_processes  28;
# this should match cpus
worker_processes  4;

events {
  multi_accept on;
  # https://easyengine.io/tutorials/nginx/optimization/
  worker_connections 1024;
  # http://nginx.2469901.n2.nabble.com/Tuning-workers-and-connections-td3192878.html
  use epoll;
}

Changed 11 months ago by chris

Changed 11 months ago by chris

Changed 11 months ago by chris

comment:4 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 1.58 to 2.08

I have spent some time looking at the Munin graphs and the Nginx changes done on ticket:893#comment:3 reduced the memory usage:


Looking at mySQL we have:

 >>  MySQLTuner 1.2.0 - Major Hayden <major@mhtx.net>
 >>  Bug reports, feature requests, and downloads at http://mysqltuner.com/
 >>  Run with '--help' for additional options and output filtering
[OK] Logged in using credentials from debian maintenance account.

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.47-MariaDB-1~wheezy-log
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: +Archive -BDB +Federated +InnoDB -ISAM -NDBCluster 
[--] Data in MyISAM tables: 203M (Tables: 3)
[--] Data in InnoDB tables: 987M (Tables: 1279)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[!!] Total fragmented tables: 139

-------- Security Recommendations  -------------------------------------------
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 10d 6h 14m 42s (82M q [92.669 qps], 886K conn, TX: 249B, RX: 11B)
[--] Reads / Writes: 93% / 7%
[--] Total buffers: 3.3G global + 28.4M per thread (50 max threads)
[OK] Maximum possible memory usage: 4.6G (25% of installed RAM)
[OK] Slow queries: 0% (629/82M)
[!!] Highest connection usage: 100%  (51/50)
[OK] Key buffer size / total MyISAM indexes: 256.0M/214.7M
[!!] Key buffer hit rate: 93.1% (61K cached / 4K reads)
[OK] Query cache efficiency: 46.9% (66M cached / 140M selects)
[!!] Query cache prunes per day: 133202
[OK] Sorts requiring temporary tables: 0% (2K temp sorts / 963K sorts)
[OK] Temporary tables created on disk: 22% (234K on disk / 1M total)
[OK] Thread cache hit rate: 99% (51 created / 886K connections)
[!!] Table cache hit rate: 0% (128 open / 178K opened)
[OK] Open file limit used: 0% (6/196K)
[OK] Table locks acquired immediately: 99% (16M immediate / 16M locks)
[OK] InnoDB data size / buffer pool: 987.4M/1.5G

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    Reduce or eliminate persistent connections to reduce connection usage
    Increasing the query_cache size over 128M may reduce performance
    Increase table_cache gradually to avoid file descriptor limits
Variables to adjust:
    max_connections (> 50)
    wait_timeout (< 3600)
    interactive_timeout (< 28800)
    query_cache_size (> 512M) [see warning above]
    table_cache (> 128)

Based on the Munin graphs and past experience I think the cache sizes are probably OK, but I have increased the max connections in /etc/mysql/my.cnf:

#max_connections         = 50
#max_user_connections    = 50
max_connections         = 75
max_user_connections    = 75

Since stopping all the BOA root cron jobs we still have a dramatic reduction in the load on the server (highest recorded weekly spike according to Munin was 3.14) and no more load spikes, the last lfd alert was on 23rd Dec 2015:

Date: Wed, 23 Dec 2015 11:47:28 +0000 (GMT)
From: root@puffin.webarch.net
To: chris@webarchitects.co.uk
Subject: lfd on puffin.webarch.net: High 5 minute load average alert - 72.36


And a great improvement in page load times:


So far it appears to be safe to say that the cause of the load spikes was BOA itself rather than any external cause, but when the Xmas holidays are over perhaps things will change.

comment:5 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Total Hours changed from 2.08 to 2.18

I have updated the wiki:PuffinServer#LoadSpikes documentation to reflect what has happened in the last couple of weeks.

comment:6 follow-up: ↓ 8 Changed 10 months ago by chris

A monthly Redis restart has been added to the root crontab, see ticket:900#comment:5

comment:7 Changed 9 months ago by chris

One side-effect from the BOA cronjobs being commented out appears to be a massive growth in the size of various cache tables, see ticket:907

comment:8 in reply to: ↑ 6 Changed 8 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 2.18 to 2.43

Replying to chris:

A monthly Redis restart has been added to the root crontab, see ticket:900#comment:5

Redis ran out of memory yesterday (29th March) so a monthly restart isn't enough, I have changed it to a restart on 1st and 15th of each month. I needed to restart several services today due to high loads caused by Redis running out of memory, see the Munin stats for details.

Note: See TracTickets for help on using tickets.