Changes between Version 61 and Version 62 of PuffinServer


Ignore:
Timestamp:
10/14/13 12:47:14 (3 years ago)
Author:
chris
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • PuffinServer

    v61 v62  
    1616 
    1717See ticket:555#comment:13 for the notes regarding the installation of the mysql munin stats package. 
     18 
     19== Load Spikes == 
     20 
     21The server has been suffering from load spikes which cause the site to be unresponsive for clients, you can see the current status via the [https://penguin.transitionnetwork.org/munin/transitionnetwork.org/puffin.transitionnetwork.org/load.html puffin Munin load graph].   
     22 
     23When the load hits 3.88 robots are served 403 Forbidden responses and when the load hits 72.2 the server shuts down until the 5 min load average falls below 44.4. 
     24 
     25The [ticket:563#second.sh default thresholds] have been changed as they were causing [trac:ticket/555 the shut to shutdown for 15 min at a time] far too often. 
     26 
     27The current thresholds are generated from these variables in {{{/root/.barracuda.cnf}}} and the commented out default ones: 
     28 
     29{{{ 
     30#_LOAD_LIMIT_ONE=1444 
     31#_LOAD_LIMIT_TWO=888 
     32_LOAD_LIMIT_ONE==7220 
     33_LOAD_LIMIT_TWO=4440 
     34}}} 
     35 
     36These variables are used by the {{{/var/xdrago/second.sh}}} script, which is run every minute via cron, which has the following variables in it: 
     37 
     38{{{ 
     39ONEX_LOAD=`awk '{print $1*100}' /proc/loadavg` 
     40FIVX_LOAD=`awk '{print $2*100}' /proc/loadavg` 
     41CTL_ONEX_SPIDER_LOAD=388 
     42CTL_FIVX_SPIDER_LOAD=388 
     43CTL_ONEX_LOAD=7220 
     44CTL_FIVX_LOAD=4440 
     45CTL_ONEX_LOAD_CRIT=1888 
     46CTL_FIVX_LOAD_CRIT=1555 
     47}}} 
     48 
     49These values translate to the following loads for comparison to the Munin graphs: 
     50 
     51* ONEX_LOAD: load average over the last minute times 100 
     52* FIVX_LOAD: load average over the last 5 minutes times 100 
     53* CTL_ONEX_SPIDER_LOAD: 3.88 
     54* CTL_FIVX_SPIDER_LOAD: 3.88 
     55* CTL_ONEX_LOAD: 72.20 
     56* CTL_FIVX_LOAD: 44.40 
     57* CTL_ONEX_LOAD_CRIT: 18.88 
     58* CTL_FIVX_LOAD_CRIT: 15.55 
     59 
     60And the logic, translated into english, is: 
     61 
     621. If the load average over the last minute is greater than 3.88 and less than 72.20 and the nginx high load config isn't in use then start to use it. 
     632. Else if the load average over the last 5 mins is greater than 3.88 and less than 44.40 and the nginx high load config isn't in use then start to use it. 
     643. Else if the load average over the last minute is less than 3.88 and the the load average over the last 5 mins is less than 3.88 and the nginx high load config is in use then stop using it. 
     65 
     661. If the load average over the last minute is greater than 18.88 then if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some jobs:  {{{killall -9 php drush.php wget}}} 
     672. Else if the load average over the last 5 mins is greater than 15.55 then if {{{/var/run/boa_run.pid}}} exists, wait a second, if not kill some jobs: {{{killall -9 php drush.php wget}}} 
     68 
     691. If the load average over the last minute is greater than 72.20 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
     702. Else if the load average over the last 5 mins is greater than 44.40 then kill the web server, {{{killall -9 nginx}}} and {{{killall -9 php-fpm php-cgi}}} 
     713. Else restart all the services via {{{/var/xdrago/proc_num_ctrl.cgi}}} 
     72 
     73 
     74 
    1875 
    1976== Tickets ==