Ticket #552 (closed maintenance: fixed)
Puffin Downtime 23rd May 2013
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | major | Milestone: | Maintenance |
Component: | Live server | Keywords: | |
Cc: | ed, jim | Estimated Number of Hours: | 0.5 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 1.0 |
Description (last modified by chris) (diff)
There was a massive load spike on wiki:PuffinServer this afternoon, the load average peaked at over 60 and at exactly the same time there was also one on the virtual server serving http://webarch.net/.
Pingdom reported:
- www.transitionnetwork.org down since 23/05/2013 12:11:57
- www.transitionnetwork.org is UP again at 23/05/2013 12:19:57, after 8m of downtime
- www.transitionnetwork.org is down since 23/05/2013 12:26:57
- www.transitionnetwork.org is UP again at 23/05/2013 12:58:57, after 32m of downtime
When I did managed to get back into the server php-fpm and nginx were not running and it took several attempt to get them to start up.
Attached is the munin graps of the load from today, munin didn't record the start of the spike but it did start recording it as it came down.
This ticket has been created to report the issue and also to record anything that might be found in the logs to explain what happened.
Attachments
Change History
comment:1 Changed 4 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
Changed 4 years ago by chris
- Attachment puffin-load-spike-2013-05-23.png added
Puffin Load Spike 2013-05-23
comment:2 Changed 4 years ago by chris
- Status changed from new to accepted
- Description modified (diff)
comment:3 Changed 4 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.15
- Total Hours changed from 0.25 to 0.4
With hindsight I could have posted about this issue today on ticket:543 as that was a similar event.
Following is what we have in /var/log/messages:
May 23 12:07:46 puffin kernel: [650765.583969] vnstatd D ffff8800fe5f9c40 0 707 1 0x00000000 May 23 12:07:46 puffin kernel: [650765.583980] ffff8800fe5f9c40 0000000000000286 ffff8800ffc15000 ffff880000009680 May 23 12:07:46 puffin kernel: [650765.583998] ffff8800fd335d58 000000000000000a 000000000000f9e0 ffff8800fd335fd8 May 23 12:07:46 puffin kernel: [650765.604533] 00000000000157c0 00000000000157c0 ffff8800023a0710 ffff8800023a0a08 May 23 12:07:46 puffin kernel: [650765.604544] Call Trace: May 23 12:07:46 puffin kernel: [650765.604556] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604566] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 May 23 12:07:46 puffin kernel: [650765.604574] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604582] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604589] [<ffffffff810f869e>] ? __link_path_walk+0x323/0x6f5 May 23 12:07:46 puffin kernel: [650765.604596] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.604603] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.604610] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.604618] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 May 23 12:07:46 puffin kernel: [650765.604626] [<ffffffff810e7491>] ? virt_to_head_page+0x9/0x2a May 23 12:07:46 puffin kernel: [650765.604633] [<ffffffff810e8829>] ? kmem_cache_free+0x72/0xa3 May 23 12:07:46 puffin kernel: [650765.604640] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.604648] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.604655] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.604695] ps D ffff880001f73880 0 17208 17202 0x00000080 May 23 12:07:46 puffin kernel: [650765.604704] ffff880001f73880 0000000000000286 0000000000000000 00000000000000d0 May 23 12:07:46 puffin kernel: [650765.604714] ffff8800ffc15000 ffffffff81012cdb 000000000000f9e0 ffff8800fdc5bfd8 May 23 12:07:46 puffin kernel: [650765.604725] 00000000000157c0 00000000000157c0 ffff8800fe220000 ffff8800fe2202f8 May 23 12:07:46 puffin kernel: [650765.604735] Call Trace: May 23 12:07:46 puffin kernel: [650765.604741] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604751] [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3 May 23 12:07:46 puffin kernel: [650765.604759] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604769] [<ffffffff810c8d3c>] ? zone_statistics+0x3c/0x5d May 23 12:07:46 puffin kernel: [650765.604777] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604784] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604791] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.604799] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.604805] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.604813] [<ffffffff810fb5ea>] ? user_path_at+0x48/0x79 May 23 12:07:46 puffin kernel: [650765.604820] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604828] [<ffffffff8100d729>] ? __xen_write_cr3+0x49/0xc4 May 23 12:07:46 puffin kernel: [650765.604836] [<ffffffff81049359>] ? finish_task_switch+0x44/0xaf May 23 12:07:46 puffin kernel: [650765.604844] [<ffffffff810f3952>] ? vfs_fstatat+0x2c/0x57 May 23 12:07:46 puffin kernel: [650765.604851] [<ffffffff810f3a45>] ? sys_newstat+0x11/0x30 May 23 12:07:46 puffin kernel: [650765.604860] [<ffffffff81093ae1>] ? audit_syscall_entry+0x10c/0x137 May 23 12:07:46 puffin kernel: [650765.604867] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.604894] awk D ffff8800fc6862e0 0 17417 17339 0x00000080 May 23 12:07:46 puffin kernel: [650765.604903] ffff8800fc6862e0 0000000000000282 ffff8800ffb19480 ffffffff810bb257 May 23 12:07:46 puffin kernel: [650765.604913] 0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800cd629fd8 May 23 12:07:46 puffin kernel: [650765.604924] 00000000000157c0 00000000000157c0 ffff8800fc558000 ffff8800fc5582f8 May 23 12:07:46 puffin kernel: [650765.604935] Call Trace: May 23 12:07:46 puffin kernel: [650765.604942] [<ffffffff810bb257>] ? get_page_from_freelist+0x56b/0x760 May 23 12:07:46 puffin kernel: [650765.604949] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604958] [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3 May 23 12:07:46 puffin kernel: [650765.604966] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604973] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604980] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604987] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.604995] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605002] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605011] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605019] [<ffffffff810d1817>] ? get_unmapped_area+0xd7/0x139 May 23 12:07:46 puffin kernel: [650765.605027] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605033] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605040] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605060] awk D ffff880001f71c40 0 17445 17397 0x00000080 May 23 12:07:46 puffin kernel: [650765.605069] ffff880001f71c40 0000000000000286 0000000000000000 0000000000000000 May 23 12:07:46 puffin kernel: [650765.605079] 0000000000000000 0000000000000000 000000000000f9e0 ffff8800fdc07fd8 May 23 12:07:46 puffin kernel: [650765.605089] 00000000000157c0 00000000000157c0 ffff8800fc55c6a0 ffff8800fc55c998 May 23 12:07:46 puffin kernel: [650765.605099] Call Trace: May 23 12:07:46 puffin kernel: [650765.605106] [<ffffffff81197696>] ? vsnprintf+0x40a/0x449 May 23 12:07:46 puffin kernel: [650765.605114] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605123] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605130] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605137] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605144] [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317 May 23 12:07:46 puffin kernel: [650765.605152] [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5 May 23 12:07:46 puffin kernel: [650765.605159] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605166] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605174] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605181] [<ffffffff8130f475>] ? page_fault+0x25/0x30 May 23 12:07:46 puffin kernel: [650765.605188] [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60 May 23 12:07:46 puffin kernel: [650765.605195] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 May 23 12:07:46 puffin kernel: [650765.605203] [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d May 23 12:07:46 puffin kernel: [650765.605211] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605218] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605224] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605245] sshd D ffff8800fc8f4db0 0 17454 2020 0x00000080 May 23 12:07:46 puffin kernel: [650765.605254] ffff8800fc8f4db0 0000000000000282 ffff8800ffa4a580 ffffffff810bb1a8 May 23 12:07:46 puffin kernel: [650765.605264] 00000000000009e5 ffff880000009680 000000000000f9e0 ffff880055c35fd8 May 23 12:07:46 puffin kernel: [650765.605276] 00000000000157c0 00000000000157c0 ffff8800fcfe1530 ffff8800fcfe1828 May 23 12:07:46 puffin kernel: [650765.605287] Call Trace: May 23 12:07:46 puffin kernel: [650765.605292] [<ffffffff810bb1a8>] ? get_page_from_freelist+0x4bc/0x760 May 23 12:07:46 puffin kernel: [650765.605300] [<ffffffff810b9b7f>] ? zone_watermark_ok+0x20/0xb1 May 23 12:07:46 puffin kernel: [650765.605308] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605316] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605323] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605330] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605337] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605344] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605352] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605360] [<ffffffff810d2172>] ? vma_link+0x74/0x9a May 23 12:07:46 puffin kernel: [650765.605367] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605374] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605381] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605401] awk D ffff880069382a60 0 17457 17396 0x00000080 May 23 12:07:46 puffin kernel: [650765.605409] ffff880069382a60 0000000000000282 0000000000000000 0000000000000000 May 23 12:07:46 puffin kernel: [650765.605420] 0000000000000000 0000000000000000 000000000000f9e0 ffff880055c97fd8 May 23 12:07:46 puffin kernel: [650765.605430] 00000000000157c0 00000000000157c0 ffff8800fc4069f0 ffff8800fc406ce8 May 23 12:07:46 puffin kernel: [650765.605440] Call Trace: May 23 12:07:46 puffin kernel: [650765.605446] [<ffffffff81197696>] ? vsnprintf+0x40a/0x449 May 23 12:07:46 puffin kernel: [650765.605453] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605461] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605468] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605475] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605482] [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317 May 23 12:07:46 puffin kernel: [650765.605490] [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5 May 23 12:07:46 puffin kernel: [650765.605497] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605504] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605512] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605518] [<ffffffff8130f475>] ? page_fault+0x25/0x30 May 23 12:07:46 puffin kernel: [650765.605525] [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60 May 23 12:07:46 puffin kernel: [650765.605531] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 May 23 12:07:46 puffin kernel: [650765.605539] [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d May 23 12:07:46 puffin kernel: [650765.605546] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605553] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605559] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
And this is from /var/log/kern.log:
May 23 12:07:46 puffin kernel: [650765.583944] INFO: task vnstatd:707 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.583960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.583969] vnstatd D ffff8800fe5f9c40 0 707 1 0x00000000 May 23 12:07:46 puffin kernel: [650765.583980] ffff8800fe5f9c40 0000000000000286 ffff8800ffc15000 ffff880000009680 May 23 12:07:46 puffin kernel: [650765.583998] ffff8800fd335d58 000000000000000a 000000000000f9e0 ffff8800fd335fd8 May 23 12:07:46 puffin kernel: [650765.604533] 00000000000157c0 00000000000157c0 ffff8800023a0710 ffff8800023a0a08 May 23 12:07:46 puffin kernel: [650765.604544] Call Trace: May 23 12:07:46 puffin kernel: [650765.604556] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604566] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 May 23 12:07:46 puffin kernel: [650765.604574] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604582] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604589] [<ffffffff810f869e>] ? __link_path_walk+0x323/0x6f5 May 23 12:07:46 puffin kernel: [650765.604596] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.604603] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.604610] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.604618] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 May 23 12:07:46 puffin kernel: [650765.604626] [<ffffffff810e7491>] ? virt_to_head_page+0x9/0x2a May 23 12:07:46 puffin kernel: [650765.604633] [<ffffffff810e8829>] ? kmem_cache_free+0x72/0xa3 May 23 12:07:46 puffin kernel: [650765.604640] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.604648] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.604655] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.604680] INFO: task ps:17208 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.604687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.604695] ps D ffff880001f73880 0 17208 17202 0x00000080 May 23 12:07:46 puffin kernel: [650765.604704] ffff880001f73880 0000000000000286 0000000000000000 00000000000000d0 May 23 12:07:46 puffin kernel: [650765.604714] ffff8800ffc15000 ffffffff81012cdb 000000000000f9e0 ffff8800fdc5bfd8 May 23 12:07:46 puffin kernel: [650765.604725] 00000000000157c0 00000000000157c0 ffff8800fe220000 ffff8800fe2202f8 May 23 12:07:46 puffin kernel: [650765.604735] Call Trace: May 23 12:07:46 puffin kernel: [650765.604741] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604751] [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3 May 23 12:07:46 puffin kernel: [650765.604759] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604769] [<ffffffff810c8d3c>] ? zone_statistics+0x3c/0x5d May 23 12:07:46 puffin kernel: [650765.604777] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604784] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604791] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.604799] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.604805] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.604813] [<ffffffff810fb5ea>] ? user_path_at+0x48/0x79 May 23 12:07:46 puffin kernel: [650765.604820] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604828] [<ffffffff8100d729>] ? __xen_write_cr3+0x49/0xc4 May 23 12:07:46 puffin kernel: [650765.604836] [<ffffffff81049359>] ? finish_task_switch+0x44/0xaf May 23 12:07:46 puffin kernel: [650765.604844] [<ffffffff810f3952>] ? vfs_fstatat+0x2c/0x57 May 23 12:07:46 puffin kernel: [650765.604851] [<ffffffff810f3a45>] ? sys_newstat+0x11/0x30 May 23 12:07:46 puffin kernel: [650765.604860] [<ffffffff81093ae1>] ? audit_syscall_entry+0x10c/0x137 May 23 12:07:46 puffin kernel: [650765.604867] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.604881] INFO: task awk:17417 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.604887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.604894] awk D ffff8800fc6862e0 0 17417 17339 0x00000080 May 23 12:07:46 puffin kernel: [650765.604903] ffff8800fc6862e0 0000000000000282 ffff8800ffb19480 ffffffff810bb257 May 23 12:07:46 puffin kernel: [650765.604913] 0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800cd629fd8 May 23 12:07:46 puffin kernel: [650765.604924] 00000000000157c0 00000000000157c0 ffff8800fc558000 ffff8800fc5582f8 May 23 12:07:46 puffin kernel: [650765.604935] Call Trace: May 23 12:07:46 puffin kernel: [650765.604942] [<ffffffff810bb257>] ? get_page_from_freelist+0x56b/0x760 May 23 12:07:46 puffin kernel: [650765.604949] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 May 23 12:07:46 puffin kernel: [650765.604958] [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3 May 23 12:07:46 puffin kernel: [650765.604966] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.604973] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.604980] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.604987] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.604995] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605002] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605011] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605019] [<ffffffff810d1817>] ? get_unmapped_area+0xd7/0x139 May 23 12:07:46 puffin kernel: [650765.605027] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605033] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605040] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605047] INFO: task awk:17445 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.605053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.605060] awk D ffff880001f71c40 0 17445 17397 0x00000080 May 23 12:07:46 puffin kernel: [650765.605069] ffff880001f71c40 0000000000000286 0000000000000000 0000000000000000 May 23 12:07:46 puffin kernel: [650765.605079] 0000000000000000 0000000000000000 000000000000f9e0 ffff8800fdc07fd8 May 23 12:07:46 puffin kernel: [650765.605089] 00000000000157c0 00000000000157c0 ffff8800fc55c6a0 ffff8800fc55c998 May 23 12:07:46 puffin kernel: [650765.605099] Call Trace: May 23 12:07:46 puffin kernel: [650765.605106] [<ffffffff81197696>] ? vsnprintf+0x40a/0x449 May 23 12:07:46 puffin kernel: [650765.605114] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605123] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605130] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605137] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605144] [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317 May 23 12:07:46 puffin kernel: [650765.605152] [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5 May 23 12:07:46 puffin kernel: [650765.605159] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605166] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605174] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605181] [<ffffffff8130f475>] ? page_fault+0x25/0x30 May 23 12:07:46 puffin kernel: [650765.605188] [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60 May 23 12:07:46 puffin kernel: [650765.605195] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 May 23 12:07:46 puffin kernel: [650765.605203] [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d May 23 12:07:46 puffin kernel: [650765.605211] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605218] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605224] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605232] INFO: task sshd:17454 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.605238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.605245] sshd D ffff8800fc8f4db0 0 17454 2020 0x00000080 May 23 12:07:46 puffin kernel: [650765.605254] ffff8800fc8f4db0 0000000000000282 ffff8800ffa4a580 ffffffff810bb1a8 May 23 12:07:46 puffin kernel: [650765.605264] 00000000000009e5 ffff880000009680 000000000000f9e0 ffff880055c35fd8 May 23 12:07:46 puffin kernel: [650765.605276] 00000000000157c0 00000000000157c0 ffff8800fcfe1530 ffff8800fcfe1828 May 23 12:07:46 puffin kernel: [650765.605287] Call Trace: May 23 12:07:46 puffin kernel: [650765.605292] [<ffffffff810bb1a8>] ? get_page_from_freelist+0x4bc/0x760 May 23 12:07:46 puffin kernel: [650765.605300] [<ffffffff810b9b7f>] ? zone_watermark_ok+0x20/0xb1 May 23 12:07:46 puffin kernel: [650765.605308] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605316] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605323] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605330] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605337] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605344] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605352] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605360] [<ffffffff810d2172>] ? vma_link+0x74/0x9a May 23 12:07:46 puffin kernel: [650765.605367] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605374] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605381] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 23 12:07:46 puffin kernel: [650765.605388] INFO: task awk:17457 blocked for more than 120 seconds. May 23 12:07:46 puffin kernel: [650765.605394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 23 12:07:46 puffin kernel: [650765.605401] awk D ffff880069382a60 0 17457 17396 0x00000080 May 23 12:07:46 puffin kernel: [650765.605409] ffff880069382a60 0000000000000282 0000000000000000 0000000000000000 May 23 12:07:46 puffin kernel: [650765.605420] 0000000000000000 0000000000000000 000000000000f9e0 ffff880055c97fd8 May 23 12:07:46 puffin kernel: [650765.605430] 00000000000157c0 00000000000157c0 ffff8800fc4069f0 ffff8800fc406ce8 May 23 12:07:46 puffin kernel: [650765.605440] Call Trace: May 23 12:07:46 puffin kernel: [650765.605446] [<ffffffff81197696>] ? vsnprintf+0x40a/0x449 May 23 12:07:46 puffin kernel: [650765.605453] [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192 May 23 12:07:46 puffin kernel: [650765.605461] [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31 May 23 12:07:46 puffin kernel: [650765.605468] [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d May 23 12:07:46 puffin kernel: [650765.605475] [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5 May 23 12:07:46 puffin kernel: [650765.605482] [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317 May 23 12:07:46 puffin kernel: [650765.605490] [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5 May 23 12:07:46 puffin kernel: [650765.605497] [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9 May 23 12:07:46 puffin kernel: [650765.605504] [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77 May 23 12:07:46 puffin kernel: [650765.605512] [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b May 23 12:07:46 puffin kernel: [650765.605518] [<ffffffff8130f475>] ? page_fault+0x25/0x30 May 23 12:07:46 puffin kernel: [650765.605525] [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60 May 23 12:07:46 puffin kernel: [650765.605531] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 May 23 12:07:46 puffin kernel: [650765.605539] [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d May 23 12:07:46 puffin kernel: [650765.605546] [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c May 23 12:07:46 puffin kernel: [650765.605553] [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc May 23 12:07:46 puffin kernel: [650765.605559] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
And from /var/log/daemon.log:
May 23 12:05:19 puffin mysqld: 130523 12:05:18 [Warning] Aborted connection 1102537 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102532 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102533 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102531 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102538 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102535 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102534 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102530 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102536 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102542 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:10:41 puffin mysqld: 130523 12:10:41 [Warning] Aborted connection 1102549 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:01 puffin mysqld: 130523 12:11:01 [Warning] Aborted connection 1102547 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:12 puffin mysqld: 130523 12:11:12 [Warning] Aborted connection 1102548 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:32 puffin mysqld: 130523 12:11:32 [Warning] Aborted connection 1102550 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:43 puffin mysqld: 130523 12:11:42 [Warning] Aborted connection 1102555 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102553 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102554 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102551 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102552 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:11:54 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102556 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:17:05 puffin mysqld: 130523 12:17:05 [Warning] Aborted connection 1102558 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:21:17 puffin mysqld: 130523 12:21:17 [Warning] Aborted connection 1102571 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:21:27 puffin mysqld: 130523 12:21:27 [Warning] Aborted connection 1102574 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:21:49 puffin mysqld: 130523 12:21:48 [Warning] Aborted connection 1102577 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:21:58 puffin mysqld: 130523 12:21:58 [Warning] Aborted connection 1102570 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:09 puffin mysqld: 130523 12:22:08 [Warning] Aborted connection 1102573 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102578 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102579 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102575 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:38 puffin mysqld: 130523 12:22:38 [Warning] Aborted connection 1102583 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:22:38 puffin mysqld: 130523 12:22:38 [Warning] Aborted connection 1102580 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:26:23 puffin mysqld: 130523 12:26:22 [Warning] Aborted connection 1102584 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:26:32 puffin mysqld: 130523 12:26:32 [Warning] Aborted connection 1102585 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:27:07 puffin mysqld: 130523 12:27:07 [Warning] Aborted connection 1102587 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:27:36 puffin mysqld: 130523 12:27:35 [Warning] Aborted connection 1102591 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:27:47 puffin mysqld: 130523 12:27:46 [Warning] Aborted connection 1102590 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:27:47 puffin mysqld: 130523 12:27:47 [Warning] Aborted connection 1102592 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:13 puffin mysqld: 130523 12:42:13 [Warning] Aborted connection 1102608 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:13 puffin mysqld: 130523 12:42:13 [Warning] Aborted connection 1102609 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102610 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102622 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102617 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102618 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102582 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102595 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:35 puffin mysqld: 130523 12:42:35 [Warning] Aborted connection 1102569 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:35 puffin mysqld: 130523 12:42:35 [Warning] Aborted connection 1102568 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102614 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102625 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102624 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102627 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102620 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102613 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102621 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102611 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102623 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102612 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 12:42:43 puffin mysqld: 130523 12:42:43 [Warning] Aborted connection 1102576 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error) May 23 12:48:12 puffin mysqld: 130523 12:48:12 [Warning] Aborted connection 1104718 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 12:48:12 puffin mysqld: 130523 12:48:12 [Warning] Aborted connection 1104714 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 12:49:12 puffin mysqld: 130523 12:49:12 [Warning] Aborted connection 1104760 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 12:49:12 puffin mysqld: 130523 12:49:12 [Warning] Aborted connection 1104756 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 12:52:12 puffin mysqld: 130523 12:52:12 [Warning] Aborted connection 1104879 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 12:52:12 puffin mysqld: 130523 12:52:12 [Warning] Aborted connection 1104877 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132391 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132382 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132380 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132383 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132373 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132378 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132372 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132405 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132384 to db: 'newstransitionne' user: 'newstransitionne' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132399 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error) May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132374 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
There is nothing I can see in the nginx logs and the php-fpm logs get clobbered by a BOA cron job so they have no data.
I'm afraid don't have a conclusion to this other than there was a load spike that took the server down for around 50 mins.
comment:4 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.5
- Total Hours changed from 0.4 to 0.9
Some things found following searches for other people having problems like this follow, the searches done were ones like this https://www.google.co.uk/search?q=task+blocked+for+more+than+120+seconds
This is a know bug. By default Linux uses up to 40% of the available memory for
file system caching. After this mark has been reached the file system flushes
all outstanding data to disk causing all following IOs going synchronous. For
flushing out this data to disk this there is a time limit of 120 seconds by
default. In the case here the IO subsystem is not fast enough to flush the data
withing 120 seconds. This especially happens on systems with a lof of memory.
The problem is solved in later kernels and there is not “fix” from Oracle. I
fixed this by lowering the mark for flushing the cache from 40% to 10% by
setting “vm.dirty_ratio=10″ in /etc/sysctl.conf. This setting does not
influence overall database performance since you hopefully use Direct IO and
bypass the file system cache completely.
http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/
And related:
what is the output of
# cat /proc/sys/vm/dirty_ratio
and
# free -mt
default on debian is 10: 10% of memory for caching
It is a known bug about data caching into memory, when the dirty ratio is too
high, IO process can take a huge time (up to 120 sec) to flush cached memory
to disk. You can try to decrease this dirty_ratio if you have a lot of memory
To change this value for testing:
echo 10 > /proc/sys/vm/dirty_ratio
To change permanently, add this to your /etc/sysctl.conf:
vm.dirty_ratio=10
We have:
cat /proc/sys/vm/dirty_ratio 20 free -mt total used free shared buffers cached Mem: 4079 3736 342 0 311 1883 -/+ buffers/cache: 1541 2538 Swap: 1023 0 1023 Total: 5103 3737 1366
For reference:
dirty_ratio
Contains, as a percentage of total system memory, the number of pages at which
a process which is generating disk writes will itself start writing out dirty
data.
There are a couple of old debian bugs that appears to be for the same issue:
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=582605
- http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=638316
And there is a Redhat article you have to pay for which says is has the answer for Redhat servers:
Looking at the /var/log/kern.log the same thing happened a few days ago:
May 22 01:21:43 puffin kernel: [525602.828162] INFO: task mysqld:5746 blocked for more than 120 seconds. May 22 01:21:43 puffin kernel: [525602.828197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 22 01:21:43 puffin kernel: [525602.828218] mysqld D ffff880003c78c40 0 5746 1734 0x00000080 May 22 01:21:43 puffin kernel: [525602.828244] ffff8800ffa6b880 0000000000000286 ffff8800fd311e38 ffff8800fd311e34 May 22 01:21:43 puffin kernel: [525602.828273] ffff8800dccc0b68 0000000000000001 000000000000f9e0 ffff8800fd311fd8 May 22 01:21:43 puffin kernel: [525602.828299] 00000000000157c0 00000000000157c0 ffff8800fc687810 ffff8800fc687b08 May 22 01:21:43 puffin kernel: [525602.828325] Call Trace: May 22 01:21:43 puffin kernel: [525602.828367] [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd] May 22 01:21:43 puffin kernel: [525602.828393] [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e May 22 01:21:43 puffin kernel: [525602.828418] [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe May 22 01:21:43 puffin kernel: [525602.828478] [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3] May 22 01:21:43 puffin kernel: [525602.828504] [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e May 22 01:21:43 puffin kernel: [525602.828525] [<ffffffff8110d77e>] ? do_fsync+0x28/0x39 May 22 01:21:43 puffin kernel: [525602.828547] [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10 May 22 01:21:43 puffin kernel: [525602.828571] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 22 01:21:43 puffin kernel: [525602.828596] INFO: task mysqld:7330 blocked for more than 120 seconds. May 22 01:21:43 puffin kernel: [525602.828615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 22 01:21:43 puffin kernel: [525602.828635] mysqld D 0000000000000000 0 7330 1734 0x00000080 May 22 01:21:43 puffin kernel: [525602.828658] ffffffff814a31f0 0000000000000286 0000000000000000 0000000000004000 May 22 01:21:43 puffin kernel: [525602.828689] ffff8800fc803e00 ffff8800fe647f48 000000000000f9e0 ffff8800fe647fd8 May 22 01:21:43 puffin kernel: [525602.828757] 00000000000157c0 00000000000157c0 ffff8800fd357100 ffff8800fd3573f8 May 22 01:21:43 puffin kernel: [525602.828794] Call Trace: May 22 01:21:43 puffin kernel: [525602.828819] [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd] May 22 01:21:43 puffin kernel: [525602.828842] [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e May 22 01:21:43 puffin kernel: [525602.828868] [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe May 22 01:21:43 puffin kernel: [525602.828899] [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3] May 22 01:21:43 puffin kernel: [525602.828923] [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e May 22 01:21:43 puffin kernel: [525602.828947] [<ffffffff8110d77e>] ? do_fsync+0x28/0x39 May 22 01:21:43 puffin kernel: [525602.828966] [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10 May 22 01:21:43 puffin kernel: [525602.828988] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b May 22 01:21:43 puffin kernel: [525602.829027] INFO: task redis-server:51205 blocked for more than 120 seconds. May 22 01:21:43 puffin kernel: [525602.829046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. May 22 01:21:43 puffin kernel: [525602.829069] redis-server D 0000000000000000 0 51205 61000 0x00000080 May 22 01:21:43 puffin kernel: [525602.829109] ffffffff814a31f0 0000000000000282 0000000000000000 000000010000000e May 22 01:21:43 puffin kernel: [525602.829150] ffff880089aed1c0 ffff8800652d7e18 000000000000f9e0 ffff8800652d7fd8 May 22 01:21:43 puffin kernel: [525602.829189] 00000000000157c0 00000000000157c0 ffff880001e52a60 ffff880001e52d58 May 22 01:21:43 puffin kernel: [525602.829225] Call Trace: May 22 01:21:43 puffin kernel: [525602.829245] [<ffffffff810b61d5>] ? wait_on_page_writeback_range+0xce/0x11b May 22 01:21:43 puffin kernel: [525602.829272] [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd] May 22 01:21:43 puffin kernel: [525602.829292] [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e May 22 01:21:43 puffin kernel: [525602.829313] [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe May 22 01:21:43 puffin kernel: [525602.829339] [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3] May 22 01:21:43 puffin kernel: [525602.829359] [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e May 22 01:21:43 puffin kernel: [525602.829377] [<ffffffff8110d77e>] ? do_fsync+0x28/0x39 May 22 01:21:43 puffin kernel: [525602.829395] [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10 May 22 01:21:43 puffin kernel: [525602.829412] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
I have run this, as suggested above, to see if it helps:
echo 10 > /proc/sys/vm/dirty_ratio cat /proc/sys/vm/dirty_ratio 10
comment:5 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Status changed from accepted to closed
- Resolution set to fixed
- Total Hours changed from 0.9 to 1.0
We are no longer getting such massive load spikes and we also have changed the kill thresholds, see wiki:PuffinServer#LoadSpikes so now closing this ticket.