These are the recent summary of results from the /usr/local/bin/50x-errors script which is run every day just before the nginx logs are rotated:
Date: Wed, 19 Jun 2013 14:53:27 +0100
Subject: 7 502, 173 503 and 0 504 errors from puffin.webarch.net
Date: Thu, 20 Jun 2013 06:25:29 +0100
Subject: 12 502, 1570 503 and 0 504 errors from puffin.webarch.net
Date: Fri, 21 Jun 2013 06:25:20 +0100
Subject: 464 502, 65 503 and 1 504 errors from puffin.webarch.net
Date: Sat, 22 Jun 2013 06:25:24 +0100
Subject: 0 502, 149 503 and 1 504 errors from puffin.webarch.net
Date: Sun, 23 Jun 2013 06:25:47 +0100
Subject: 1 502, 212 503 and 0 504 errors from puffin.webarch.net
Date: Mon, 24 Jun 2013 06:25:16 +0100
Subject: 2 502, 103 503 and 1 504 errors from puffin.webarch.net
It's worth noting that since the downtime on 20th June, which is in the stats above for 21st June as the stats are generated at 6:25am, there have been very few 502 or 504 errors. The 503 errors for the last few days have been generated when the site is in high load mode and this has been happening quite often, see the spikes on the graphs here: ticket:555#comment:54.
The thresholds in second.sh were initially multiplied by four, see ticket:555#comment:43, and then by five, see ticket:555#comment:52, and the RAM was doubled after the 20th June downtime, from 4GB to 8GB.
I think it would perhaps be worth doing a bit more work on the 50x-errors script -- if it counted, sorted and listed the user agents that were triggering the 503 errors this would be a good check that it is an error that is just served to bots.
Apart from that this ticket is probably ready to be closed.