Ticket #890 (new defect)
Site offline.
Reported by: | sam | Owned by: | ade |
---|---|---|---|
Priority: | major | Milestone: | Maintenance |
Component: | Unassigned | Keywords: | |
Cc: | paul, chris | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 1.8 |
Description
It's serving a page, so may be Drupal level problem rather than server level?
Change History
comment:2 Changed 12 months ago by paul
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info]
I this this will be something best left to Chris.
comment:3 Changed 12 months ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.25 to 0.5
I don't know exactly what the problem was with MySQL, but I had to force stop it before restarting it:
/etc/init.d/mysql status /usr/bin/mysqladmin: connect to server at 'localhost' failed error: 'Too many connections' [info] . /etc/init.d/mysql stop [FAIL] Stopping MariaDB database server: mysqld failed! /etc/init.d/mysql stop [FAIL] Stopping MariaDB database server: mysqld failed! ps -lA | grep mysql 4 S 105 539 32182 3 80 0 - 916634 - ? 01:19:47 mysqld 0 S 0 21863 21862 0 80 0 - 10334 - ? 00:00:00 mysqldump 0 S 0 32182 1 0 80 0 - 2712 - ? 00:00:00 mysqld_safe killall -9 mysqld /etc/init.d/mysql start [ ok ] Starting MariaDB database server: mysqld . . . .. [info] Checking for corrupt, not cleanly closed and upgrade needing tables..
MySQL was updated a couple of days ago, perhaps it is related to this, ticket:692#comment:228
The site is back up now, I'll look at the logs later to see if I can find a cause.
comment:4 Changed 11 months ago by chris
- Add Hours to Ticket changed from 0.0 to 0.5
- Total Hours changed from 0.5 to 1.0
There is nothing in /var/log/syslog or /var/log/messages, in /var/log/daemon.log there is this:
Dec 11 13:09:15 puffin mysqld: 151211 13:09:15 [Warning] Aborted connection 180657 to db: 'transitionnetw_0' user: 'transitionnetw_0' host: 'localhost' (Unknown error) Dec 12 01:08:17 puffin mysqld: 151212 1:08:17 [Warning] Aborted connection 297180 to db: 'transitionnetw_0' user: 'transitionnetw_0' host: 'localhost' (Unknown error)
But these errors are not new, there are lots of them from previous days. There is nothing in auth.log or any of the MySQL logs -- I'm afraid I have no idea why MySQL stopped and why it couldn't be restarted with force.
comment:5 Changed 11 months ago by paul
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 1.0 to 1.25
Nothing further to add really.
The initial problem was that the mysql server was no longer accepting connections; that could be connected with the recent memory problems .
comment:6 Changed 11 months ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 1.25 to 1.5
Same thing has happened, very high loads and:
/etc/init.d/mysql status /usr/bin/mysqladmin: connect to server at 'localhost' failed error: 'Too many connections' [info] .
So:
/etc/init.d/mysql stop [FAIL] Stopping MariaDB database server: mysqld failed! killall -9 mysqld /etc/init.d/mysql stop [ ok ] Stopping MariaDB database server: mysqld. /etc/init.d/mysql start [ ok ] Starting MariaDB database server: mysqld already running.
Looking at the munin graphs MySQL died at about 1am.
The load has gone up to over 60 again:
top - 11:42:50 up 16 days, 19:42, 4 users, load average: 62.87, 39.18, 32.36 Tasks: 367 total, 75 running, 292 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.6 us, 11.7 sy, 0.4 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 87.0 st KiB Mem: 9420636 total, 7673756 used, 1746880 free, 507672 buffers KiB Swap: 0 total, 0 used, 0 free, 5222208 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 30793 tn.web 30 10 1261m 80m 45m R 98 0.9 0:44.09 php-fpm 3735 aegir 20 0 174m 33m 8348 R 98 0.4 0:19.10 php 4925 root 20 0 18208 892 732 R 69 0.0 0:02.26 id 4906 root 20 0 0 0 0 R 68 0.0 0:04.54 mysqladmin 4535 aegir 20 0 212m 21m 8860 R 66 0.2 0:10.63 php 4917 root 20 0 16988 1256 976 R 64 0.0 0:02.34 ps 3849 www-data 20 0 0 0 0 R 60 0.0 0:02.80 nginx 4150 www-data 20 0 79476 15m 1164 R 53 0.2 0:02.02 nginx 4922 root 20 0 12628 1100 888 R 49 0.0 0:01.61 ps 4918 root 20 0 8348 964 780 R 43 0.0 0:01.60 ps 4923 root 20 0 66932 22m 656 R 37 0.2 0:01.23 lfd - child clo 4808 aegir 20 0 203m 16m 8384 R 37 0.2 0:13.78 php 4926 root 20 0 5552 588 500 S 32 0.0 0:01.05 sleep 4920 root 20 0 5552 588 500 S 29 0.0 0:00.97 sleep 4907 aegir 20 0 0 0 0 R 26 0.0 0:01.13 sh 10350 root 20 0 66932 23m 2032 R 25 0.3 13:48.00 lfd - processin 4896 root 20 0 0 0 0 R 21 0.0 0:02.07 awk 4197 www-data 20 0 79476 16m 2012 S 19 0.2 0:03.16 nginx 3923 tn.web 30 10 1234m 15m 6984 R 19 0.2 0:08.18 php-fpm 3176 tn.web 30 10 1235m 26m 16m R 17 0.3 0:17.40 php-fpm 3573 tn.web 30 10 1235m 23m 13m R 15 0.3 0:08.92 php-fpm 3627 tn.web 30 10 1235m 17m 7264 R 13 0.2 0:09.64 php-fpm 3207 tn.web 30 10 1235m 26m 16m R 12 0.3 0:13.99 php-fpm 3122 tn.web 30 10 1234m 21m 10m R 11 0.2 0:26.89 php-fpm 3519 tn.web 30 10 1235m 26m 16m R 11 0.3 0:09.64 php-fpm 4921 root 20 0 11456 1392 432 S 10 0.0 0:00.34 bash 3159 tn.web 30 10 1235m 26m 16m R 10 0.3 0:18.61 php-fpm 4126 www-data 20 0 79476 15m 1164 S 10 0.2 0:00.56 nginx 3569 tn.web 30 10 1235m 27m 16m R 9 0.3 0:05.88 php-fpm 1980 root 20 0 6656 628 504 S 9 0.0 168:48.62 vnstatd 3496 tn.web 30 10 1235m 16m 6276 R 9 0.2 0:06.09 php-fpm 4911 root 20 0 16940 1376 976 D 9 0.0 0:01.96 ps 3128 tn.web 30 10 1233m 14m 6484 R 8 0.2 0:22.64 php-fpm 3604 tn.web 30 10 1235m 17m 6984 R 8 0.2 0:06.36 php-fpm 3636 tn.web 30 10 1234m 15m 6432 R 8 0.2 0:09.42 php-fpm 3162 tn.web 30 10 1235m 23m 12m R 8 0.3 0:18.04 php-fpm 3187 tn.web 30 10 1234m 19m 9988 R 8 0.2 0:18.06 php-fpm 3098 tn.web 30 10 1244m 44m 25m R 8 0.5 0:38.66 php-fpm 3156 tn.web 30 10 1239m 37m 23m R 7 0.4 0:18.25 php-fpm 3467 tn.web 30 10 1234m 15m 6432 R 7 0.2 0:09.88 php-fpm 3590 tn.web 30 10 1233m 14m 6484 R 7 0.2 0:08.35 php-fpm 3142 tn.web 30 10 1235m 26m 16m R 7 0.3 0:52.05 php-fpm 3174 tn.web 30 10 1244m 42m 23m R 7 0.5 0:17.13 php-fpm
I'm going to reboot it with some more RAM for the day.
comment:7 Changed 11 months ago by chris
- Add Hours to Ticket changed from 0.0 to 0.15
- Total Hours changed from 1.5 to 1.65
Things look fine now:
top - 11:58:45 up 9 min, 1 user, load average: 0.34, 0.72, 0.46
I'll reboot it back to 9GB of RAM tonight.
comment:8 Changed 11 months ago by chris
- Add Hours to Ticket changed from 0.0 to 0.15
- Total Hours changed from 1.65 to 1.8
PuffinServer is not even able to return a value for uptime again, it must have had another massive load spike and locked up, so I'm rebooting it again.
comment:9 Changed 11 months ago by chris
Going to follow this up on the load spike ticket, ticket:846