Changes between Version 72 and Version 73 of PuffinServer


Ignore:
Timestamp:
10/23/13 12:47:44 (3 years ago)
Author:
chris
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PuffinServer

    v72 v73  
    2121The server has been suffering from load spikes which cause the site to be unresponsive for clients, you can see the current status via the [https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/load.html puffin Munin load graph], note the Max values for the last day, week, month and year. 
    2222 
    23 '''When the load hits 3.88 robots are served 403 Forbidden responses and when the load hits 18.88 maintenance tasks are killed and when the load hits 72.2 the server terminates until the 5 min load average falls below 44.4.''' 
    24  
    25 The [ticket:563#second.sh default thresholds] have been changed as they were causing [ticket:555 the shut to shutdown for 15 min at a time] far too often. 
     23'''When the load hits 23.28 robots are served 403 Forbidden responses and when the load hits 86.64 maintenance tasks are killed and when the load hits 113.28 the server terminates until the 5 min load average falls below 93.30.''' 
     24 
     25The [ticket:563#second.sh default thresholds] have been changed as they were causing [ticket:555 the shut to shutdown for 15 min at a time] far too often, the [ticket:555#comment:124 current values] were applied on [ticket:555#comment:126 23rd October 2013]. 
    2626 
    2727The server has 14 CPU cores, see [https://en.wikipedia.org/wiki/Load_average#Unix-style_load_calculation Unix-style load calculation], the current thresholds are generated from these variables in {{{/root/.barracuda.cnf}}}, the commented out values are the default ones: 
     
    3030#_LOAD_LIMIT_ONE=1444 
    3131#_LOAD_LIMIT_TWO=888 
    32 _LOAD_LIMIT_ONE=7220 
    33 _LOAD_LIMIT_TWO=4440 
     32_LOAD_LIMIT_ONE=8664 
     33_LOAD_LIMIT_TWO=5328 
    3434}}} 
    3535 
     
    3939ONEX_LOAD=`awk '{print $1*100}' /proc/loadavg` 
    4040FIVX_LOAD=`awk '{print $2*100}' /proc/loadavg` 
    41 CTL_ONEX_SPIDER_LOAD=388 
    42 CTL_FIVX_SPIDER_LOAD=388 
    43 CTL_ONEX_LOAD=7220 
    44 CTL_FIVX_LOAD=4440 
    45 CTL_ONEX_LOAD_CRIT=1888 
    46 CTL_FIVX_LOAD_CRIT=1555 
     41CTL_ONEX_SPIDER_LOAD=2328 
     42CTL_FIVX_SPIDER_LOAD=2328 
     43CTL_ONEX_LOAD=8664 
     44CTL_FIVX_LOAD=5328 
     45CTL_ONEX_LOAD_CRIT=11328 
     46CTL_FIVX_LOAD_CRIT=9330 
    4747}}} 
    4848 
     
    5151* ONEX_LOAD: load average over the last minute times 100 
    5252* FIVX_LOAD: load average over the last 5 minutes times 100 
    53 * CTL_ONEX_SPIDER_LOAD: 3.88 
    54 * CTL_FIVX_SPIDER_LOAD: 3.88 
    55 * CTL_ONEX_LOAD: 72.20 
    56 * CTL_FIVX_LOAD: 44.40 
    57 * CTL_ONEX_LOAD_CRIT: 18.88 
    58 * CTL_FIVX_LOAD_CRIT: 15.55 
     53* CTL_ONEX_SPIDER_LOAD: 23.28 
     54* CTL_FIVX_SPIDER_LOAD: 23.28 
     55* CTL_ONEX_LOAD: 86.64 
     56* CTL_FIVX_LOAD: 53.28 
     57* CTL_ONEX_LOAD_CRIT: 113.28 
     58* CTL_FIVX_LOAD_CRIT: 93.30 
    5959 
    6060And the logic, translated into english, is: 
    6161 
    62 1. If the load average over the last minute is greater than 3.88 and less than 72.20 and the nginx high load config isn't in use then start to use it. 
    63 2. Else if the load average over the last 5 mins is greater than 3.88 and less than 44.40 and the nginx high load config isn't in use then start to use it. 
    64 3. Else if the load average over the last minute is less than 3.88 and the the load average over the last 5 mins is less than 3.88 and the nginx high load config is in use then stop using it. 
    65  
    66 1. If the load average over the last minute is greater than 18.88 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs:  {{{killall -9 php drush.php wget}}} 
    67 2. Else if the load average over the last 5 mins is greater than 15.55 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}} 
    68  
    69 1. If the load average over the last minute is greater than 72.20 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
    70 2. Else if the load average over the last 5 mins is greater than 44.40 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
     621. If the load average over the last minute is greater than 23.28 and less than 86.64 and the nginx high load config isn't in use then start to use it. 
     632. Else if the load average over the last 5 mins is greater than 23.28 and less than 53.28 and the nginx high load config isn't in use then start to use it. 
     643. Else if the load average over the last minute is less than 23.28 and the the load average over the last 5 mins is less than 23.28 and the nginx high load config is in use then stop using it. 
     65 
     661. If the load average over the last minute is greater than 132.16 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs:  {{{killall -9 php drush.php wget}}} 
     672. Else if the load average over the last 5 mins is greater than 108.85 and if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some maintenance jobs: {{{killall -9 php drush.php wget}}} 
     68 
     691. If the load average over the last minute is greater than 101.08 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
     702. Else if the load average over the last 5 mins is greater than 62.16 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
    71713. Else restart all the services via {{{/var/xdrago/proc_num_ctrl.cgi}}} 
    7272