Ticket #905 (new maintenance)

Opened 9 months ago

Last modified 9 months ago

TN site down due to redis not running

Reported by: chris Owned by: chris
Priority: blocker Milestone: Maintenance
Component: Live server Keywords:
Cc: ade, sam, paul Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 0.55

Description

I'm working on this...

Change History

comment:1 Changed 9 months ago by ade

Many thanks Chris,
See site is back

Ade
On 25 Feb 2016 10:28 a.m., "Transition Technology Trac" <
trac@tech.transitionnetwork.org> wrote:

> #905: TN site down due to redis not running
> -------------------------------------+-------------------------------------
>                  Reporter:  chris    |                Owner:  chris
>                      Type:           |               Status:  new
>   maintenance                        |            Milestone:  Maintenance
>                  Priority:  blocker  |             Keywords:
>                 Component:  Live     |  Add Hours to Ticket:  0
>   server                             |          Total Hours:  0
> Estimated Number of Hours:  0        |
>                 Billable?:  1        |
> -------------------------------------+-------------------------------------
>  I'm working on this...
>
> --
> Ticket URL: <https://tech.transitionnetwork.org/trac/ticket/905>
> Transition Technology <https://tech.transitionnetwork.org/trac>
> Support and issues tracking for the Transition Network Web Project.
>

comment:2 Changed 9 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.4
  • Total Hours changed from 0.0 to 0.4

The site is back up.

Here is what I did (restarting MySQL and Redis):

chris@puffin:~$ sudo -i
puffin:~# /etc/init.d/mysql status
/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info] .
puffin:~# /etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
puffin:~# /etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
puffin:~# /etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
puffin:~# ps -lA | grep mysql
4 S     0  7292     1  0  80   0 -  2329 -      ?        00:00:00 mysqld_safe
4 S   105  8232  7292  7  80   0 - 1243829 -    ?        14:42:58 mysqld
0 S     0 32619 32618  0  80   0 - 10336 -      ?        00:00:00 mysqldump
puffin:~# /etc/init.d/mysql stop
[FAIL] Stopping MariaDB database server: mysqld failed!
puffin:~# /etc/init.d/mysql restart
[FAIL] Stopping MariaDB database server: mysqld failed!
[ ok ] Starting MariaDB database server: mysqld already running.
puffin:~# /etc/init.d/mysql status
/usr/bin/mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
[info] .
puffin:~# vi /etc/mysql/my.cnf
puffin:~# /etc/init.d/redis-server status
redis-server is not running
puffin:~# /etc/init.d/redis-server start
Starting redis-server: touch: cannot touch `/var/run/redis/redis.pid': No such file or directory
puffin:~# mkdir -p /var/run/redis
puffin:~# /etc/init.d/redis-server start
Starting redis-server: redis-server.
puffin:~# /etc/init.d/mysql restart
[FAIL] Stopping MariaDB database server: mysqld failed!
[ ok ] Starting MariaDB database server: mysqld already running.
puffin:~# /etc/init.d/redis-server status
redis-server is running
puffin:~# /etc/init.d/redis-server status
redis-server is running
puffin:~# /etc/init.d/mysql restart
[FAIL] Stopping MariaDB database server: mysqld failed!
[ ok ] Starting MariaDB database server: mysqld already running.
puffin:~# killall mysqld
puffin:~# killall mysqld_safe
puffin:~# killall mysqld^C
puffin:~# ps -lA | grep mysql
4 S     0  7292     1  0  80   0 -  2330 -      ?        00:00:00 mysqld_safe
4 R   105 32510  7292 10  80   0 - 681761 -     ?        00:00:00 mysqld
puffin:~# ps -lA | grep mysql
4 S     0  7292     1  0  80   0 -  2330 -      ?        00:00:00 mysqld_safe
4 S   105 32510  7292 11  80   0 - 694055 -     ?        00:00:01 mysqld
puffin:~# killall -9 mysqld_safe
puffin:~# killall -9 mysqld
puffin:~# ps -lA | grep mysql
puffin:~# /etc/init.d/mysql start
[ ok ] Starting MariaDB database server: mysqld . . . ..
[info] Checking for corrupt, not cleanly closed and upgrade needing tables..

It looks like redis stopped, I'm not sure when, the Munin stats stop about a week ago.

Also at 1am there was an issue with MySQL, these are some extracts from the /var/log/syslog.1:

Feb 25 01:03:57 puffin mysqld: 160225  1:03:57 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
Feb 25 01:03:57 puffin mysqld: 160225  1:03:57 [ERROR] mysqld: Deadlock found when trying to get lock; try restarting transaction
...

Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: a long semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140248992335616 has waited at buf0buf.c line 4099 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: a long semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244815951616 has waited at buf0buf.ic line 376 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8dabb070d0 '&block->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: a long semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244814354176 has waited at buf0buf.c line 1836 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: a long semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244810893056 has waited at buf0buf.c line 4082 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: a long semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140246175950592 has waited at buf0lru.c line 1106 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140248992335616 has waited at buf0buf.c line 4099 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244815951616 has waited at buf0buf.ic line 376 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8dabb070d0 '&block->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244814354176 has waited at buf0buf.c line 1836 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140244810893056 has waited at buf0buf.c line 4082 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: semaphore wait:
Feb 25 01:15:23 puffin mysqld: --Thread 140246175950592 has waited at buf0lru.c line 1106 for 241.00 seconds the semaphore:
Feb 25 01:15:23 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:23 puffin mysqld: waiters flag 1
...
Feb 25 01:15:23 puffin mysqld: InnoDB: Warning: Writer thread is waiting this semaphore:
Feb 25 01:15:23 puffin mysqld: --Thread 140248993400576 has waited at buf0lru.c line 1141 for 240.00 seconds the semaphore:
Feb 25 01:15:23 puffin rsyslogd-2177: imuxsock begins to drop messages from pid 8233 due to rate-limiting
Feb 25 01:15:32 puffin rsyslogd-2177: imuxsock lost 585 messages from pid 8233 due to rate-limiting
Feb 25 01:15:32 puffin mysqld: 
Feb 25 01:15:32 puffin mysqld: =====================================
Feb 25 01:15:32 puffin mysqld: 160225  1:15:32 INNODB MONITOR OUTPUT
Feb 25 01:15:32 puffin mysqld: =====================================
Feb 25 01:15:32 puffin mysqld: Per second averages calculated from the last 60 seconds
Feb 25 01:15:32 puffin mysqld: -----------------
Feb 25 01:15:32 puffin mysqld: BACKGROUND THREAD
Feb 25 01:15:32 puffin mysqld: -----------------
Feb 25 01:15:32 puffin mysqld: srv_master_thread loops: 787246 1_second, 786418 sleeps, 75744 10_second, 35237 background, 35237 flush
Feb 25 01:15:32 puffin mysqld: srv_master_thread log flush and writes: 674339
Feb 25 01:15:32 puffin mysqld: ----------
Feb 25 01:15:32 puffin mysqld: SEMAPHORES
Feb 25 01:15:32 puffin mysqld: ----------
Feb 25 01:15:32 puffin mysqld: OS WAIT ARRAY INFO: reservation count 199312, signal count 191399
Feb 25 01:15:32 puffin mysqld: --Thread 140248992335616 has waited at buf0buf.c line 4099 for 250.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140244815951616 has waited at buf0buf.ic line 376 for 250.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8dabb070d0 '&block->mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140244814354176 has waited at buf0buf.c line 1836 for 250.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140244810893056 has waited at buf0buf.c line 4082 for 250.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140246175950592 has waited at buf0lru.c line 1106 for 250.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f478 '&buf_pool->mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140246184343296 has waited at buf0lru.c line 1058 for 249.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140248993400576 has waited at buf0lru.c line 1141 for 249.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: Mutex at 0x7f8e0d09f4f8 '&buf_pool->LRU_list_mutex', lock var 1
Feb 25 01:15:32 puffin mysqld: waiters flag 1
Feb 25 01:15:32 puffin mysqld: --Thread 140244813821696 has waited at btr0cur.c line 671 for 246.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: S-lock on RW-latch at 0x7f8da009d588 '&new_index->lock'
Feb 25 01:15:32 puffin mysqld: a writer (thread id 140248993400576) has reserved it in mode  exclusive
Feb 25 01:15:32 puffin mysqld: number of readers 0, waiters flag 1, lock_word: 0
Feb 25 01:15:32 puffin mysqld: Last time read locked in file btr0cur.c line 671
Feb 25 01:15:32 puffin mysqld: Last time write locked in file /home/buildbot/buildbot/build/mariadb-5.5.48/storage/xtradb/btr/btr0cur.c line 664
Feb 25 01:15:32 puffin mysqld: --Thread 140244812756736 has waited at btr0cur.c line 671 for 242.00 seconds the semaphore:
Feb 25 01:15:32 puffin mysqld: S-lock on RW-latch at 0x7f8da009d588 '&new_index->lock'
Feb 25 01:15:32 puffin mysqld: a writer (thread id 140248993400576) has reserved it in mode  exclusive
Feb 25 01:15:32 puffin mysqld: number of readers 0, waiters flag 1, lock_word: 0
...
Feb 25 01:15:32 puffin mysqld: a writer (thread id 140248993400576) has reserved it in mode  exclusive
Feb 25 01:15:32 puffin rsyslogd-2177: imuxsock begins to drop messages from pid 8233 due to rate-limiting
Feb 25 01:15:46 puffin kernel: [687843.174807] Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=00:16:3e:19:68:02:00:00:24:d1:8a:9a:08:00 SRC=74.91.21.2 DST=81.95.52.103 LEN=443 TOS=0x00 PREC=0x00 TTL=50 ID=58757 DF PROTO=UDP SPT=5543 DPT=5060 LEN=423
Feb 25 01:15:53 puffin rsyslogd-2177: imuxsock lost 297 messages from pid 8233 due to rate-limiting
Feb 25 01:15:53 puffin mysqld: InnoDB: ###### Diagnostic info printed to the standard error stream

I must admit I don't know what caused this.

comment:3 Changed 9 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.4 to 0.55

Here is some info on what can cause a MySQL DEADLOCK:

Since this is the first time I'm aware that this this has happened and this server is due to be de-commissioned by the end of April due to the switch to WordPress I suggest we don't spend much time trying to find and fix the cause unless it happens again.

comment:4 Changed 9 months ago by paul

It'll probably be difficult to find the query responsible for the deadlock,
 if this doesn't happen very often. If this problem starts to happen to
more frequently maybe we could turn on the mysql query log and try to
associate the deadlock error with  a particular mysql query; we can then
put try / catch around it or something.

On Thu, Feb 25, 2016 at 10:59 AM, Transition Technology Trac <
trac@tech.transitionnetwork.org> wrote:

> #905: TN site down due to redis not running
> -------------------------------------+-------------------------------------
>            Reporter:  chris          |                      Owner:  chris
>                Type:  maintenance    |                     Status:  new
>            Priority:  blocker        |                  Milestone:
>           Component:  Live server    |  Maintenance
>            Keywords:                 |                 Resolution:
> Add Hours to Ticket:  0.15           |  Estimated Number of Hours:  0.0
>         Total Hours:  0.4            |                  Billable?:  1
> -------------------------------------+-------------------------------------
> Changes (by chris):
>
>  * hours:  0.0 => 0.15
>  * totalhours:  0.4 => 0.55
>
>
> Comment:
>
>  Here is some info on what can cause a MySQL DEADLOCK:
>
>  * https://stackoverflow.com/questions/7782120/which-query-is-causing-
>  deadlock-found-when-trying-to-get-lock-try-restarting-tr
>
>  Since this is the first time I'm aware that this this has happened and
>  this server is due to be [ticket:846#comment:86 de-commissioned by the end
>  of April] due to the [ticket:904 switch to WordPress] I suggest we don't
>  spend much time trying to find and fix the cause unless it happens again.
>
> --
> Ticket URL: <https://tech.transitionnetwork.org/trac/ticket/905#comment:3>
> Transition Technology <https://tech.transitionnetwork.org/trac>
> Support and issues tracking for the Transition Network Web Project.
>



-- 
Paul Booker
Drupal Support for Websites and Linux Servers
Website: http://www.paulbooker.co.uk
Tel: +44 01922 861636

Note: See TracTickets for help on using tickets.