Ticket #666 (closed maintenance: fixed)
Parrot lockups
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | critical | Milestone: | Maintenance |
Component: | Dev server | Keywords: | |
Cc: | ed, aland | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 0.32 |
Description
We have this on the console, I'm going to reboot it:
[28800.164426] [<ffffffffa001c1ba>] ? do_get_write_access+0x22c/0x452 [jbd2] [28800.164435] [<ffffffff81066360>] ? wake_bit_function+0x0/0x23 [28800.164443] [<ffffffff8104b51c>] ? try_to_wake_up+0x289/0x29b [28800.164453] [<ffffffffa001c402>] ? jbd2_journal_get_write_access+0x22/0x33 [jbd2] [28800.164475] [<ffffffffa006289e>] ? __ext4_journal_get_write_access+0x4e/0x56 [ext4] [28800.164492] [<ffffffffa0042b8e>] ? ext4_reserve_inode_write+0x37/0x73 [ext4] [28800.164508] [<ffffffffa0042c05>] ? ext4_mark_inode_dirty+0x3b/0x1c4 [ext4] [28800.164528] [<ffffffffa005bdc7>] ? ext4_journal_start_sb+0xd4/0x10e [ext4] [28800.164543] [<ffffffffa0042eb0>] ? ext4_dirty_inode+0x30/0x46 [ext4] [28800.164553] [<ffffffff81109ead>] ? __mark_inode_dirty+0x25/0x14a [28800.164560] [<ffffffff8110138b>] ? file_update_time+0x101/0x130 [28800.164569] [<ffffffff810b6835>] ? __generic_file_aio_write+0x16e/0x293 [28800.164578] [<ffffffff810b69b3>] ? generic_file_aio_write+0x59/0x9f [28800.164588] [<ffffffff810f0316>] ? do_sync_write+0xce/0x113 [28800.164596] [<ffffffff810fcd0c>] ? filldir+0x0/0xb7 [28800.164605] [<ffffffff810549b1>] ? _local_bh_enable_ip+0x22/0x8f [28800.164613] [<ffffffff81066332>] ? autoremove_wake_function+0x0/0x2e [28800.164626] [<ffffffff8130f1a1>] ? _spin_lock_bh+0x9/0x25 [28800.164626] [<ffffffff810f0c68>] ? vfs_write+0xa9/0x102 [28800.164632] [<ffffffff810f0d18>] ? sys_pwrite64+0x57/0x77 [28800.164639] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b [28800.164657] INFO: task apache2:31559 blocked for more than 120 seconds. [28800.164665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [28800.164675] apache2 D 0000000000000000 0 31559 28011 0x00000000 [28800.164691] ffffffff8149f1f0 0000000000000286 0000000000000000 ffffffff81274f4f [28800.164711] ffff880002dc99d8 ffff8800bd1ca000 000000000000f9e0 ffff880002dc9fd8 [28800.164730] 00000000000157c0 00000000000157c0 ffff8800bd9746a0 ffff8800bd974998 [28800.164752] Call Trace: [28800.164762] [<ffffffff81274f4f>] ? sch_direct_xmit+0x7f/0x14c [28800.164773] [<ffffffff81066253>] ? bit_waitqueue+0x10/0xa0 [28800.164787] [<ffffffffa001c1ba>] ? do_get_write_access+0x22c/0x452 [jbd2] [28800.164798] [<ffffffff81066360>] ? wake_bit_function+0x0/0x23 [28800.164812] [<ffffffffa001c402>] ? jbd2_journal_get_write_access+0x22/0x33 [jbd2] [28800.164833] [<ffffffffa006289e>] ? __ext4_journal_get_write_access+0x4e/0x56 [ext4] [28800.164852] [<ffffffffa0042b8e>] ? ext4_reserve_inode_write+0x37/0x73 [ext4] [28800.164871] [<ffffffffa0042c05>] ? ext4_mark_inode_dirty+0x3b/0x1c4 [ext4] [28800.164890] [<ffffffffa005bdc7>] ? ext4_journal_start_sb+0xd4/0x10e [ext4] [28800.164908] [<ffffffffa0042eb0>] ? ext4_dirty_inode+0x30/0x46 [ext4] [28800.164921] [<ffffffff81109ead>] ? __mark_inode_dirty+0x25/0x14a [28800.164932] [<ffffffff8110138b>] ? file_update_time+0x101/0x130 [28800.164943] [<ffffffff810b6835>] ? __generic_file_aio_write+0x16e/0x293 [28800.164958] [<ffffffff8125227b>] ? sock_aio_write+0x0/0xbc [28800.164969] [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83 [28800.164980] [<ffffffff810b69b3>] ? generic_file_aio_write+0x59/0x9f [28800.164992] [<ffffffff810f0316>] ? do_sync_write+0xce/0x113 [28800.165003] [<ffffffff81066332>] ? autoremove_wake_function+0x0/0x2e [28800.165015] [<ffffffff810ce24c>] ? handle_mm_fault+0x3b8/0x80f [28800.165027] [<ffffffff810f0c68>] ? vfs_write+0xa9/0x102 [28800.165038] [<ffffffff810f0d7d>] ? sys_write+0x45/0x6e [28800.165049] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b [125024.867759] hrtimer: interrupt took 38561246 ns [1412520.196163] INFO: task mysqld:7928 blocked for more than 120 seconds. [1412520.196183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1412520.196191] mysqld D 0000000000000000 0 7928 18454 0x00000000 [1412520.196203] ffff8800bfa69530 0000000000000286 0000000000000000 0000000000000000 [1412520.196215] 0007ffffffffffff 0000000000000001 000000000000f9e0 ffff880074cb5fd8 [1412520.196226] 00000000000157c0 00000000000157c0 ffff880002ce1530 ffff880002ce1828 [1412520.196236] Call Trace: [1412520.196259] [<ffffffffa00232bf>] ? jbd2_log_wait_commit+0xbf/0x112 [jbd2] [1412520.196273] [<ffffffff81066332>] ? autoremove_wake_function+0x0/0x2e [1412520.196293] [<ffffffffa003fb41>] ? ext4_sync_file+0x199/0x25c [ext4] [1412520.196304] [<ffffffff8110d6e0>] ? vfs_fsync_range+0x73/0x9e [1412520.196319] [<ffffffff8110d78a>] ? do_fsync+0x28/0x39 [1412520.196325] [<ffffffff8110d7b9>] ? sys_fsync+0xb/0x10 [1412520.196333] [<ffffffff81011b63>] ? sysret_check+0x17/0x5a [1412520.196341] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Change History
comment:1 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.32
- Total Hours changed from 0.0 to 0.32
comment:3 Changed 3 years ago by chris
- Cc aland added
- Status changed from closed to reopened
- Resolution fixed deleted
- Summary changed from Parrot isn't responding to Parrot lockups
wiki:ParrotServer locked up again today, again nothing in the logs, I stopped and restarted it at a xen level.
I have reopened this ticket to keep an eye on this issue and also added Alan as a CC.
comment:4 Changed 3 years ago by chris
The alert I got about the problem from munin was as before:
From: munin@penguin.webarch.net Date: Wed, 15 Jan 2014 13:55:22 +0000 Subject: parrot.transitionnetwork.org Munin Alert transitionnetwork.org :: parrot.transitionnetwork.org :: eth0 errors UNKNOWNs: errors is unknown, errors is unknown.
comment:5 Changed 3 years ago by chris
- Status changed from reopened to closed
- Resolution set to fixed
Closing this, hoping the fix for the NFZ/ZFS server has resolved this, see ticket:618#comment:5
Note: See
TracTickets for help on using
tickets.
The server is back up.
The console errors were not recent, this is from /var/log/kern.log.2.gz:
I can't see anything in the logs to indicate why it was not responding today.
There is also nothing I can see in the munin logs, https://penguin.transitionnetwork.org/munin/transitionnetwork.org/parrot.transitionnetwork.org/
I was alerted to the lask of response from the server by this email:
And I couldn't connect via SSH.
It's possible that it would have recovered without intervention.
Closing this ticket as I can't think of anything else to do on it and the server is up and running now.