Ticket #890 (new defect)

Opened 12 months ago

Last modified 11 months ago

Site offline.

Reported by: sam Owned by: ade
Priority: major Milestone: Maintenance
Component: Unassigned Keywords:
Cc: paul, chris Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 1.8

Description

It's serving a page, so may be Drupal level problem rather than server level?

https://www.transitionnetwork.org/

Change History

comment:1 Changed 11 months ago by paul

Investigating ..

On Sat, Dec 12, 2015 at 10:54 AM, Transition Technology Trac <
trac@tech.transitionnetwork.org> wrote:

> #890: Site offline.
> -------------------------------------+-------------------------------------
>                  Reporter:  sam      |                Owner:  ade
>                      Type:  defect   |               Status:  new
>                  Priority:  major    |            Milestone:  Maintenance
>                 Component:           |             Keywords:
>   Unassigned                         |  Add Hours to Ticket:  0
> Estimated Number of Hours:  0        |          Total Hours:  0
>                 Billable?:  1        |
> -------------------------------------+-------------------------------------
>  It's serving a page, so may be Drupal level problem rather than server
>  level?
>
>  https://www.transitionnetwork.org/
>
> --
> Ticket URL: <https://tech.transitionnetwork.org/trac/ticket/890>
> Transition Technology <https://tech.transitionnetwork.org/trac>
> Support and issues tracking for the Transition Network Web Project.
>



-- 
Paul Booker
Drupal Support for Websites and Linux Servers
Website: http://www.paulbooker.co.uk
Tel: +44 01922 861636

comment:2 Changed 11 months ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info]

I this this will be something best left to Chris.

Last edited 11 months ago by paul (previous) (diff)

comment:3 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.25 to 0.5

I don't know exactly what the problem was with MySQL, but I had to force stop it before restarting it:

/etc/init.d/mysql status
/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info] .
/etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
/etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
ps -lA | grep mysql
4 S   105   539 32182  3  80   0 - 916634 -     ?        01:19:47 mysqld
0 S     0 21863 21862  0  80   0 - 10334 -      ?        00:00:00 mysqldump
0 S     0 32182     1  0  80   0 -  2712 -      ?        00:00:00 mysqld_safe
killall -9 mysqld
/etc/init.d/mysql start
[ ok ] Starting MariaDB database server: mysqld . . . ..
[info] Checking for corrupt, not cleanly closed and upgrade needing tables..

MySQL was updated a couple of days ago, perhaps it is related to this, ticket:692#comment:228

The site is back up now, I'll look at the logs later to see if I can find a cause.

comment:4 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.5 to 1.0

There is nothing in /var/log/syslog or /var/log/messages, in /var/log/daemon.log there is this:

Dec 11 13:09:15 puffin mysqld: 151211 13:09:15 [Warning] Aborted connection 180657 to db: 'transitionnetw_0' user: 'transitionnetw_0' host: 'localhost' (Unknown error)


Dec 12 01:08:17 puffin mysqld: 151212  1:08:17 [Warning] Aborted connection 297180 to db: 'transitionnetw_0' user: 'transitionnetw_0' host: 'localhost' (Unknown error)

But these errors are not new, there are lots of them from previous days. There is nothing in auth.log or any of the MySQL logs -- I'm afraid I have no idea why MySQL stopped and why it couldn't be restarted with force.

comment:5 Changed 11 months ago by paul

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.0 to 1.25

Nothing further to add really.

The initial problem was that the mysql server was no longer accepting connections; that could be connected with the recent memory problems .

comment:6 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 1.25 to 1.5

Same thing has happened, very high loads and:

 /etc/init.d/mysql status
/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info] .

So:

/etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
killall -9 mysqld
/etc/init.d/mysql stop
[ ok ] Stopping MariaDB database server: mysqld.
/etc/init.d/mysql start
[ ok ] Starting MariaDB database server: mysqld already running.

Looking at the munin graphs MySQL died at about 1am.

The load has gone up to over 60 again:

top - 11:42:50 up 16 days, 19:42,  4 users,  load average: 62.87, 39.18, 32.36
Tasks: 367 total,  75 running, 292 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.6 us, 11.7 sy,  0.4 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si, 87.0 st
KiB Mem:   9420636 total,  7673756 used,  1746880 free,   507672 buffers
KiB Swap:        0 total,        0 used,        0 free,  5222208 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
30793 tn.web    30  10 1261m  80m  45m R    98  0.9   0:44.09 php-fpm
 3735 aegir     20   0  174m  33m 8348 R    98  0.4   0:19.10 php
 4925 root      20   0 18208  892  732 R    69  0.0   0:02.26 id
 4906 root      20   0     0    0    0 R    68  0.0   0:04.54 mysqladmin
 4535 aegir     20   0  212m  21m 8860 R    66  0.2   0:10.63 php
 4917 root      20   0 16988 1256  976 R    64  0.0   0:02.34 ps
 3849 www-data  20   0     0    0    0 R    60  0.0   0:02.80 nginx
 4150 www-data  20   0 79476  15m 1164 R    53  0.2   0:02.02 nginx
 4922 root      20   0 12628 1100  888 R    49  0.0   0:01.61 ps
 4918 root      20   0  8348  964  780 R    43  0.0   0:01.60 ps
 4923 root      20   0 66932  22m  656 R    37  0.2   0:01.23 lfd - child clo
 4808 aegir     20   0  203m  16m 8384 R    37  0.2   0:13.78 php
 4926 root      20   0  5552  588  500 S    32  0.0   0:01.05 sleep
 4920 root      20   0  5552  588  500 S    29  0.0   0:00.97 sleep
 4907 aegir     20   0     0    0    0 R    26  0.0   0:01.13 sh
10350 root      20   0 66932  23m 2032 R    25  0.3  13:48.00 lfd - processin
 4896 root      20   0     0    0    0 R    21  0.0   0:02.07 awk
 4197 www-data  20   0 79476  16m 2012 S    19  0.2   0:03.16 nginx
 3923 tn.web    30  10 1234m  15m 6984 R    19  0.2   0:08.18 php-fpm
 3176 tn.web    30  10 1235m  26m  16m R    17  0.3   0:17.40 php-fpm
 3573 tn.web    30  10 1235m  23m  13m R    15  0.3   0:08.92 php-fpm
 3627 tn.web    30  10 1235m  17m 7264 R    13  0.2   0:09.64 php-fpm
 3207 tn.web    30  10 1235m  26m  16m R    12  0.3   0:13.99 php-fpm
 3122 tn.web    30  10 1234m  21m  10m R    11  0.2   0:26.89 php-fpm
 3519 tn.web    30  10 1235m  26m  16m R    11  0.3   0:09.64 php-fpm
 4921 root      20   0 11456 1392  432 S    10  0.0   0:00.34 bash
 3159 tn.web    30  10 1235m  26m  16m R    10  0.3   0:18.61 php-fpm
 4126 www-data  20   0 79476  15m 1164 S    10  0.2   0:00.56 nginx
 3569 tn.web    30  10 1235m  27m  16m R     9  0.3   0:05.88 php-fpm
 1980 root      20   0  6656  628  504 S     9  0.0 168:48.62 vnstatd
 3496 tn.web    30  10 1235m  16m 6276 R     9  0.2   0:06.09 php-fpm
 4911 root      20   0 16940 1376  976 D     9  0.0   0:01.96 ps
 3128 tn.web    30  10 1233m  14m 6484 R     8  0.2   0:22.64 php-fpm
 3604 tn.web    30  10 1235m  17m 6984 R     8  0.2   0:06.36 php-fpm
 3636 tn.web    30  10 1234m  15m 6432 R     8  0.2   0:09.42 php-fpm
 3162 tn.web    30  10 1235m  23m  12m R     8  0.3   0:18.04 php-fpm
 3187 tn.web    30  10 1234m  19m 9988 R     8  0.2   0:18.06 php-fpm
 3098 tn.web    30  10 1244m  44m  25m R     8  0.5   0:38.66 php-fpm
 3156 tn.web    30  10 1239m  37m  23m R     7  0.4   0:18.25 php-fpm
 3467 tn.web    30  10 1234m  15m 6432 R     7  0.2   0:09.88 php-fpm
 3590 tn.web    30  10 1233m  14m 6484 R     7  0.2   0:08.35 php-fpm
 3142 tn.web    30  10 1235m  26m  16m R     7  0.3   0:52.05 php-fpm
 3174 tn.web    30  10 1244m  42m  23m R     7  0.5   0:17.13 php-fpm

I'm going to reboot it with some more RAM for the day.

comment:7 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 1.5 to 1.65

Things look fine now:

top - 11:58:45 up 9 min,  1 user,  load average: 0.34, 0.72, 0.46

I'll reboot it back to 9GB of RAM tonight.

comment:8 Changed 11 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 1.65 to 1.8

PuffinServer is not even able to return a value for uptime again, it must have had another massive load spike and locked up, so I'm rebooting it again.

comment:9 Changed 11 months ago by chris

Going to follow this up on the load spike ticket, ticket:846

Note: See TracTickets for help on using tickets.