Changes between Version 72 and Version 73 of PuffinServer
- Timestamp:
- 10/23/13 12:47:44 (3 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
PuffinServer
v72 v73 21 21 The server has been suffering from load spikes which cause the site to be unresponsive for clients, you can see the current status via the [https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/load.html puffin Munin load graph], note the Max values for the last day, week, month and year. 22 22 23 '''When the load hits 3.88 robots are served 403 Forbidden responses and when the load hits 18.88 maintenance tasks are killed and when the load hits 72.2 the server terminates until the 5 min load average falls below 44.4.'''24 25 The [ticket:563#second.sh default thresholds] have been changed as they were causing [ticket:555 the shut to shutdown for 15 min at a time] far too often .23 '''When the load hits 23.28 robots are served 403 Forbidden responses and when the load hits 86.64 maintenance tasks are killed and when the load hits 113.28 the server terminates until the 5 min load average falls below 93.30.''' 24 25 The [ticket:563#second.sh default thresholds] have been changed as they were causing [ticket:555 the shut to shutdown for 15 min at a time] far too often, the [ticket:555#comment:124 current values] were applied on [ticket:555#comment:126 23rd October 2013]. 26 26 27 27 The server has 14 CPU cores, see [https://en.wikipedia.org/wiki/Load_average#Unix-style_load_calculation Unix-style load calculation], the current thresholds are generated from these variables in {{{/root/.barracuda.cnf}}}, the commented out values are the default ones: … … 30 30 #_LOAD_LIMIT_ONE=1444 31 31 #_LOAD_LIMIT_TWO=888 32 _LOAD_LIMIT_ONE= 722033 _LOAD_LIMIT_TWO= 444032 _LOAD_LIMIT_ONE=8664 33 _LOAD_LIMIT_TWO=5328 34 34 }}} 35 35 … … 39 39 ONEX_LOAD=`awk '{print $1*100}' /proc/loadavg` 40 40 FIVX_LOAD=`awk '{print $2*100}' /proc/loadavg` 41 CTL_ONEX_SPIDER_LOAD= 38842 CTL_FIVX_SPIDER_LOAD= 38843 CTL_ONEX_LOAD= 722044 CTL_FIVX_LOAD= 444045 CTL_ONEX_LOAD_CRIT=1 88846 CTL_FIVX_LOAD_CRIT= 155541 CTL_ONEX_SPIDER_LOAD=2328 42 CTL_FIVX_SPIDER_LOAD=2328 43 CTL_ONEX_LOAD=8664 44 CTL_FIVX_LOAD=5328 45 CTL_ONEX_LOAD_CRIT=11328 46 CTL_FIVX_LOAD_CRIT=9330 47 47 }}} 48 48 … … 51 51 * ONEX_LOAD: load average over the last minute times 100 52 52 * FIVX_LOAD: load average over the last 5 minutes times 100 53 * CTL_ONEX_SPIDER_LOAD: 3.8854 * CTL_FIVX_SPIDER_LOAD: 3.8855 * CTL_ONEX_LOAD: 72.2056 * CTL_FIVX_LOAD: 44.4057 * CTL_ONEX_LOAD_CRIT: 1 8.8858 * CTL_FIVX_LOAD_CRIT: 15.5553 * CTL_ONEX_SPIDER_LOAD: 23.28 54 * CTL_FIVX_SPIDER_LOAD: 23.28 55 * CTL_ONEX_LOAD: 86.64 56 * CTL_FIVX_LOAD: 53.28 57 * CTL_ONEX_LOAD_CRIT: 113.28 58 * CTL_FIVX_LOAD_CRIT: 93.30 59 59 60 60 And the logic, translated into english, is: 61 61 62 1. If the load average over the last minute is greater than 3.88 and less than 72.20and the nginx high load config isn't in use then start to use it.63 2. Else if the load average over the last 5 mins is greater than 3.88 and less than 44.40and the nginx high load config isn't in use then start to use it.64 3. Else if the load average over the last minute is less than 3.88 and the the load average over the last 5 mins is less than 3.88 and the nginx high load config is in use then stop using it.65 66 1. If the load average over the last minute is greater than 1 8.88and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}}67 2. Else if the load average over the last 5 mins is greater than 1 5.55 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}}68 69 1. If the load average over the last minute is greater than 72.20then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}}70 2. Else if the load average over the last 5 mins is greater than 44.40then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}}62 1. If the load average over the last minute is greater than 23.28 and less than 86.64 and the nginx high load config isn't in use then start to use it. 63 2. Else if the load average over the last 5 mins is greater than 23.28 and less than 53.28 and the nginx high load config isn't in use then start to use it. 64 3. Else if the load average over the last minute is less than 23.28 and the the load average over the last 5 mins is less than 23.28 and the nginx high load config is in use then stop using it. 65 66 1. If the load average over the last minute is greater than 132.16 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}} 67 2. Else if the load average over the last 5 mins is greater than 108.85 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}} 68 69 1. If the load average over the last minute is greater than 101.08 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 70 2. Else if the load average over the last 5 mins is greater than 62.16 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 71 71 3. Else restart all the services via {{{/var/xdrago/proc_num_ctrl.cgi}}} 72 72