Ticket #552 (closed maintenance: fixed)

Opened 4 years ago

Last modified 3 years ago

Puffin Downtime 23rd May 2013

Reported by: chris Owned by: chris
Priority: major Milestone: Maintenance
Component: Live server Keywords:
Cc: ed, jim Estimated Number of Hours: 0.5
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 1.0

Description (last modified by chris) (diff)

There was a massive load spike on wiki:PuffinServer this afternoon, the load average peaked at over 60 and at exactly the same time there was also one on the virtual server serving http://webarch.net/.

Pingdom reported:

  • www.transitionnetwork.org down since 23/05/2013 12:11:57
  • www.transitionnetwork.org is UP again at 23/05/2013 12:19:57, after 8m of downtime
  • www.transitionnetwork.org is down since 23/05/2013 12:26:57
  • www.transitionnetwork.org is UP again at 23/05/2013 12:58:57, after 32m of downtime

When I did managed to get back into the server php-fpm and nginx were not running and it took several attempt to get them to start up.

Attached is the munin graps of the load from today, munin didn't record the start of the spike but it did start recording it as it came down.

This ticket has been created to report the issue and also to record anything that might be found in the logs to explain what happened.

Attachments

puffin-load-spike-2013-05-23.png (16.5 KB) - added by chris 4 years ago.
Puffin Load Spike 2013-05-23

Change History

comment:1 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

Changed 4 years ago by chris

Puffin Load Spike 2013-05-23

comment:2 Changed 4 years ago by chris

  • Status changed from new to accepted
  • Description modified (diff)

comment:3 Changed 4 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.15
  • Total Hours changed from 0.25 to 0.4

With hindsight I could have posted about this issue today on ticket:543 as that was a similar event.

Following is what we have in /var/log/messages:

May 23 12:07:46 puffin kernel: [650765.583969] vnstatd       D ffff8800fe5f9c40     0   707      1 0x00000000
May 23 12:07:46 puffin kernel: [650765.583980]  ffff8800fe5f9c40 0000000000000286 ffff8800ffc15000 ffff880000009680
May 23 12:07:46 puffin kernel: [650765.583998]  ffff8800fd335d58 000000000000000a 000000000000f9e0 ffff8800fd335fd8
May 23 12:07:46 puffin kernel: [650765.604533]  00000000000157c0 00000000000157c0 ffff8800023a0710 ffff8800023a0a08
May 23 12:07:46 puffin kernel: [650765.604544] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604556]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604566]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
May 23 12:07:46 puffin kernel: [650765.604574]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604582]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604589]  [<ffffffff810f869e>] ? __link_path_walk+0x323/0x6f5
May 23 12:07:46 puffin kernel: [650765.604596]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.604603]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.604610]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.604618]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
May 23 12:07:46 puffin kernel: [650765.604626]  [<ffffffff810e7491>] ? virt_to_head_page+0x9/0x2a
May 23 12:07:46 puffin kernel: [650765.604633]  [<ffffffff810e8829>] ? kmem_cache_free+0x72/0xa3
May 23 12:07:46 puffin kernel: [650765.604640]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.604648]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.604655]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.604695] ps            D ffff880001f73880     0 17208  17202 0x00000080
May 23 12:07:46 puffin kernel: [650765.604704]  ffff880001f73880 0000000000000286 0000000000000000 00000000000000d0
May 23 12:07:46 puffin kernel: [650765.604714]  ffff8800ffc15000 ffffffff81012cdb 000000000000f9e0 ffff8800fdc5bfd8
May 23 12:07:46 puffin kernel: [650765.604725]  00000000000157c0 00000000000157c0 ffff8800fe220000 ffff8800fe2202f8
May 23 12:07:46 puffin kernel: [650765.604735] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604741]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604751]  [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3
May 23 12:07:46 puffin kernel: [650765.604759]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604769]  [<ffffffff810c8d3c>] ? zone_statistics+0x3c/0x5d
May 23 12:07:46 puffin kernel: [650765.604777]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604784]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604791]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.604799]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.604805]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.604813]  [<ffffffff810fb5ea>] ? user_path_at+0x48/0x79
May 23 12:07:46 puffin kernel: [650765.604820]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604828]  [<ffffffff8100d729>] ? __xen_write_cr3+0x49/0xc4
May 23 12:07:46 puffin kernel: [650765.604836]  [<ffffffff81049359>] ? finish_task_switch+0x44/0xaf
May 23 12:07:46 puffin kernel: [650765.604844]  [<ffffffff810f3952>] ? vfs_fstatat+0x2c/0x57
May 23 12:07:46 puffin kernel: [650765.604851]  [<ffffffff810f3a45>] ? sys_newstat+0x11/0x30
May 23 12:07:46 puffin kernel: [650765.604860]  [<ffffffff81093ae1>] ? audit_syscall_entry+0x10c/0x137
May 23 12:07:46 puffin kernel: [650765.604867]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.604894] awk           D ffff8800fc6862e0     0 17417  17339 0x00000080
May 23 12:07:46 puffin kernel: [650765.604903]  ffff8800fc6862e0 0000000000000282 ffff8800ffb19480 ffffffff810bb257
May 23 12:07:46 puffin kernel: [650765.604913]  0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800cd629fd8
May 23 12:07:46 puffin kernel: [650765.604924]  00000000000157c0 00000000000157c0 ffff8800fc558000 ffff8800fc5582f8
May 23 12:07:46 puffin kernel: [650765.604935] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604942]  [<ffffffff810bb257>] ? get_page_from_freelist+0x56b/0x760
May 23 12:07:46 puffin kernel: [650765.604949]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604958]  [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3
May 23 12:07:46 puffin kernel: [650765.604966]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604973]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604980]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604987]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.604995]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605002]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605011]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605019]  [<ffffffff810d1817>] ? get_unmapped_area+0xd7/0x139
May 23 12:07:46 puffin kernel: [650765.605027]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605033]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605040]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605060] awk           D ffff880001f71c40     0 17445  17397 0x00000080
May 23 12:07:46 puffin kernel: [650765.605069]  ffff880001f71c40 0000000000000286 0000000000000000 0000000000000000
May 23 12:07:46 puffin kernel: [650765.605079]  0000000000000000 0000000000000000 000000000000f9e0 ffff8800fdc07fd8
May 23 12:07:46 puffin kernel: [650765.605089]  00000000000157c0 00000000000157c0 ffff8800fc55c6a0 ffff8800fc55c998
May 23 12:07:46 puffin kernel: [650765.605099] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605106]  [<ffffffff81197696>] ? vsnprintf+0x40a/0x449
May 23 12:07:46 puffin kernel: [650765.605114]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605123]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605130]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605137]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605144]  [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317
May 23 12:07:46 puffin kernel: [650765.605152]  [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5
May 23 12:07:46 puffin kernel: [650765.605159]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605166]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605174]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605181]  [<ffffffff8130f475>] ? page_fault+0x25/0x30
May 23 12:07:46 puffin kernel: [650765.605188]  [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60
May 23 12:07:46 puffin kernel: [650765.605195]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
May 23 12:07:46 puffin kernel: [650765.605203]  [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d
May 23 12:07:46 puffin kernel: [650765.605211]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605218]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605224]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605245] sshd          D ffff8800fc8f4db0     0 17454   2020 0x00000080
May 23 12:07:46 puffin kernel: [650765.605254]  ffff8800fc8f4db0 0000000000000282 ffff8800ffa4a580 ffffffff810bb1a8
May 23 12:07:46 puffin kernel: [650765.605264]  00000000000009e5 ffff880000009680 000000000000f9e0 ffff880055c35fd8
May 23 12:07:46 puffin kernel: [650765.605276]  00000000000157c0 00000000000157c0 ffff8800fcfe1530 ffff8800fcfe1828
May 23 12:07:46 puffin kernel: [650765.605287] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605292]  [<ffffffff810bb1a8>] ? get_page_from_freelist+0x4bc/0x760
May 23 12:07:46 puffin kernel: [650765.605300]  [<ffffffff810b9b7f>] ? zone_watermark_ok+0x20/0xb1
May 23 12:07:46 puffin kernel: [650765.605308]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605316]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605323]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605330]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605337]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605344]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605352]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605360]  [<ffffffff810d2172>] ? vma_link+0x74/0x9a
May 23 12:07:46 puffin kernel: [650765.605367]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605374]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605381]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605401] awk           D ffff880069382a60     0 17457  17396 0x00000080
May 23 12:07:46 puffin kernel: [650765.605409]  ffff880069382a60 0000000000000282 0000000000000000 0000000000000000
May 23 12:07:46 puffin kernel: [650765.605420]  0000000000000000 0000000000000000 000000000000f9e0 ffff880055c97fd8
May 23 12:07:46 puffin kernel: [650765.605430]  00000000000157c0 00000000000157c0 ffff8800fc4069f0 ffff8800fc406ce8
May 23 12:07:46 puffin kernel: [650765.605440] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605446]  [<ffffffff81197696>] ? vsnprintf+0x40a/0x449
May 23 12:07:46 puffin kernel: [650765.605453]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605461]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605468]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605475]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605482]  [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317
May 23 12:07:46 puffin kernel: [650765.605490]  [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5
May 23 12:07:46 puffin kernel: [650765.605497]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605504]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605512]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605518]  [<ffffffff8130f475>] ? page_fault+0x25/0x30
May 23 12:07:46 puffin kernel: [650765.605525]  [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60
May 23 12:07:46 puffin kernel: [650765.605531]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
May 23 12:07:46 puffin kernel: [650765.605539]  [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d
May 23 12:07:46 puffin kernel: [650765.605546]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605553]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605559]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

And this is from /var/log/kern.log:

May 23 12:07:46 puffin kernel: [650765.583944] INFO: task vnstatd:707 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.583960] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.583969] vnstatd       D ffff8800fe5f9c40     0   707      1 0x00000000
May 23 12:07:46 puffin kernel: [650765.583980]  ffff8800fe5f9c40 0000000000000286 ffff8800ffc15000 ffff880000009680
May 23 12:07:46 puffin kernel: [650765.583998]  ffff8800fd335d58 000000000000000a 000000000000f9e0 ffff8800fd335fd8
May 23 12:07:46 puffin kernel: [650765.604533]  00000000000157c0 00000000000157c0 ffff8800023a0710 ffff8800023a0a08
May 23 12:07:46 puffin kernel: [650765.604544] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604556]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604566]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
May 23 12:07:46 puffin kernel: [650765.604574]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604582]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604589]  [<ffffffff810f869e>] ? __link_path_walk+0x323/0x6f5
May 23 12:07:46 puffin kernel: [650765.604596]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.604603]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.604610]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.604618]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
May 23 12:07:46 puffin kernel: [650765.604626]  [<ffffffff810e7491>] ? virt_to_head_page+0x9/0x2a
May 23 12:07:46 puffin kernel: [650765.604633]  [<ffffffff810e8829>] ? kmem_cache_free+0x72/0xa3
May 23 12:07:46 puffin kernel: [650765.604640]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.604648]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.604655]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.604680] INFO: task ps:17208 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.604687] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.604695] ps            D ffff880001f73880     0 17208  17202 0x00000080
May 23 12:07:46 puffin kernel: [650765.604704]  ffff880001f73880 0000000000000286 0000000000000000 00000000000000d0
May 23 12:07:46 puffin kernel: [650765.604714]  ffff8800ffc15000 ffffffff81012cdb 000000000000f9e0 ffff8800fdc5bfd8
May 23 12:07:46 puffin kernel: [650765.604725]  00000000000157c0 00000000000157c0 ffff8800fe220000 ffff8800fe2202f8
May 23 12:07:46 puffin kernel: [650765.604735] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604741]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604751]  [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3
May 23 12:07:46 puffin kernel: [650765.604759]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604769]  [<ffffffff810c8d3c>] ? zone_statistics+0x3c/0x5d
May 23 12:07:46 puffin kernel: [650765.604777]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604784]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604791]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.604799]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.604805]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.604813]  [<ffffffff810fb5ea>] ? user_path_at+0x48/0x79
May 23 12:07:46 puffin kernel: [650765.604820]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604828]  [<ffffffff8100d729>] ? __xen_write_cr3+0x49/0xc4
May 23 12:07:46 puffin kernel: [650765.604836]  [<ffffffff81049359>] ? finish_task_switch+0x44/0xaf
May 23 12:07:46 puffin kernel: [650765.604844]  [<ffffffff810f3952>] ? vfs_fstatat+0x2c/0x57
May 23 12:07:46 puffin kernel: [650765.604851]  [<ffffffff810f3a45>] ? sys_newstat+0x11/0x30
May 23 12:07:46 puffin kernel: [650765.604860]  [<ffffffff81093ae1>] ? audit_syscall_entry+0x10c/0x137
May 23 12:07:46 puffin kernel: [650765.604867]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.604881] INFO: task awk:17417 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.604887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.604894] awk           D ffff8800fc6862e0     0 17417  17339 0x00000080
May 23 12:07:46 puffin kernel: [650765.604903]  ffff8800fc6862e0 0000000000000282 ffff8800ffb19480 ffffffff810bb257
May 23 12:07:46 puffin kernel: [650765.604913]  0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800cd629fd8
May 23 12:07:46 puffin kernel: [650765.604924]  00000000000157c0 00000000000157c0 ffff8800fc558000 ffff8800fc5582f8
May 23 12:07:46 puffin kernel: [650765.604935] Call Trace:
May 23 12:07:46 puffin kernel: [650765.604942]  [<ffffffff810bb257>] ? get_page_from_freelist+0x56b/0x760
May 23 12:07:46 puffin kernel: [650765.604949]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
May 23 12:07:46 puffin kernel: [650765.604958]  [<ffffffff81155186>] ? cap_inode_permission+0x0/0x3
May 23 12:07:46 puffin kernel: [650765.604966]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.604973]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.604980]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.604987]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.604995]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605002]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605011]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605019]  [<ffffffff810d1817>] ? get_unmapped_area+0xd7/0x139
May 23 12:07:46 puffin kernel: [650765.605027]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605033]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605040]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605047] INFO: task awk:17445 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.605053] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.605060] awk           D ffff880001f71c40     0 17445  17397 0x00000080
May 23 12:07:46 puffin kernel: [650765.605069]  ffff880001f71c40 0000000000000286 0000000000000000 0000000000000000
May 23 12:07:46 puffin kernel: [650765.605079]  0000000000000000 0000000000000000 000000000000f9e0 ffff8800fdc07fd8
May 23 12:07:46 puffin kernel: [650765.605089]  00000000000157c0 00000000000157c0 ffff8800fc55c6a0 ffff8800fc55c998
May 23 12:07:46 puffin kernel: [650765.605099] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605106]  [<ffffffff81197696>] ? vsnprintf+0x40a/0x449
May 23 12:07:46 puffin kernel: [650765.605114]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605123]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605130]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605137]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605144]  [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317
May 23 12:07:46 puffin kernel: [650765.605152]  [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5
May 23 12:07:46 puffin kernel: [650765.605159]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605166]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605174]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605181]  [<ffffffff8130f475>] ? page_fault+0x25/0x30
May 23 12:07:46 puffin kernel: [650765.605188]  [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60
May 23 12:07:46 puffin kernel: [650765.605195]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
May 23 12:07:46 puffin kernel: [650765.605203]  [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d
May 23 12:07:46 puffin kernel: [650765.605211]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605218]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605224]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605232] INFO: task sshd:17454 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.605238] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.605245] sshd          D ffff8800fc8f4db0     0 17454   2020 0x00000080
May 23 12:07:46 puffin kernel: [650765.605254]  ffff8800fc8f4db0 0000000000000282 ffff8800ffa4a580 ffffffff810bb1a8
May 23 12:07:46 puffin kernel: [650765.605264]  00000000000009e5 ffff880000009680 000000000000f9e0 ffff880055c35fd8
May 23 12:07:46 puffin kernel: [650765.605276]  00000000000157c0 00000000000157c0 ffff8800fcfe1530 ffff8800fcfe1828
May 23 12:07:46 puffin kernel: [650765.605287] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605292]  [<ffffffff810bb1a8>] ? get_page_from_freelist+0x4bc/0x760
May 23 12:07:46 puffin kernel: [650765.605300]  [<ffffffff810b9b7f>] ? zone_watermark_ok+0x20/0xb1
May 23 12:07:46 puffin kernel: [650765.605308]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605316]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605323]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605330]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605337]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605344]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605352]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605360]  [<ffffffff810d2172>] ? vma_link+0x74/0x9a
May 23 12:07:46 puffin kernel: [650765.605367]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605374]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605381]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 23 12:07:46 puffin kernel: [650765.605388] INFO: task awk:17457 blocked for more than 120 seconds.
May 23 12:07:46 puffin kernel: [650765.605394] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 23 12:07:46 puffin kernel: [650765.605401] awk           D ffff880069382a60     0 17457  17396 0x00000080
May 23 12:07:46 puffin kernel: [650765.605409]  ffff880069382a60 0000000000000282 0000000000000000 0000000000000000
May 23 12:07:46 puffin kernel: [650765.605420]  0000000000000000 0000000000000000 000000000000f9e0 ffff880055c97fd8
May 23 12:07:46 puffin kernel: [650765.605430]  00000000000157c0 00000000000157c0 ffff8800fc4069f0 ffff8800fc406ce8
May 23 12:07:46 puffin kernel: [650765.605440] Call Trace:
May 23 12:07:46 puffin kernel: [650765.605446]  [<ffffffff81197696>] ? vsnprintf+0x40a/0x449
May 23 12:07:46 puffin kernel: [650765.605453]  [<ffffffff8130e573>] ? __mutex_lock_common+0x122/0x192
May 23 12:07:46 puffin kernel: [650765.605461]  [<ffffffff8130e69b>] ? mutex_lock+0x1a/0x31
May 23 12:07:46 puffin kernel: [650765.605468]  [<ffffffff810f7ea0>] ? do_lookup+0x80/0x15d
May 23 12:07:46 puffin kernel: [650765.605475]  [<ffffffff810f8920>] ? __link_path_walk+0x5a5/0x6f5
May 23 12:07:46 puffin kernel: [650765.605482]  [<ffffffff810f825e>] ? do_follow_link+0x1fa/0x317
May 23 12:07:46 puffin kernel: [650765.605490]  [<ffffffff810f86d4>] ? __link_path_walk+0x359/0x6f5
May 23 12:07:46 puffin kernel: [650765.605497]  [<ffffffff810f8c9e>] ? path_walk+0x66/0xc9
May 23 12:07:46 puffin kernel: [650765.605504]  [<ffffffff810fa108>] ? do_path_lookup+0x20/0x77
May 23 12:07:46 puffin kernel: [650765.605512]  [<ffffffff810fa294>] ? do_filp_open+0xe5/0x94b
May 23 12:07:46 puffin kernel: [650765.605518]  [<ffffffff8130f475>] ? page_fault+0x25/0x30
May 23 12:07:46 puffin kernel: [650765.605525]  [<ffffffff8130f6aa>] ? error_exit+0x2a/0x60
May 23 12:07:46 puffin kernel: [650765.605531]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
May 23 12:07:46 puffin kernel: [650765.605539]  [<ffffffff8119844c>] ? strncpy_from_user+0x51/0x6d
May 23 12:07:46 puffin kernel: [650765.605546]  [<ffffffff811036f9>] ? alloc_fd+0x67/0x10c
May 23 12:07:46 puffin kernel: [650765.605553]  [<ffffffff810eeac3>] ? do_sys_open+0x55/0xfc
May 23 12:07:46 puffin kernel: [650765.605559]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

And from /var/log/daemon.log:

May 23 12:05:19 puffin mysqld: 130523 12:05:18 [Warning] Aborted connection 1102537 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102532 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102533 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:05:19 puffin mysqld: 130523 12:05:19 [Warning] Aborted connection 1102531 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102538 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102535 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:20 puffin mysqld: 130523 12:06:20 [Warning] Aborted connection 1102534 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102530 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102536 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:06:21 puffin mysqld: 130523 12:06:21 [Warning] Aborted connection 1102542 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:10:41 puffin mysqld: 130523 12:10:41 [Warning] Aborted connection 1102549 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:01 puffin mysqld: 130523 12:11:01 [Warning] Aborted connection 1102547 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:12 puffin mysqld: 130523 12:11:12 [Warning] Aborted connection 1102548 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:32 puffin mysqld: 130523 12:11:32 [Warning] Aborted connection 1102550 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:43 puffin mysqld: 130523 12:11:42 [Warning] Aborted connection 1102555 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102553 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102554 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102551 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:53 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102552 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:11:54 puffin mysqld: 130523 12:11:53 [Warning] Aborted connection 1102556 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:17:05 puffin mysqld: 130523 12:17:05 [Warning] Aborted connection 1102558 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:21:17 puffin mysqld: 130523 12:21:17 [Warning] Aborted connection 1102571 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:21:27 puffin mysqld: 130523 12:21:27 [Warning] Aborted connection 1102574 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:21:49 puffin mysqld: 130523 12:21:48 [Warning] Aborted connection 1102577 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:21:58 puffin mysqld: 130523 12:21:58 [Warning] Aborted connection 1102570 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:09 puffin mysqld: 130523 12:22:08 [Warning] Aborted connection 1102573 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102578 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102579 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:29 puffin mysqld: 130523 12:22:28 [Warning] Aborted connection 1102575 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:38 puffin mysqld: 130523 12:22:38 [Warning] Aborted connection 1102583 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:22:38 puffin mysqld: 130523 12:22:38 [Warning] Aborted connection 1102580 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:26:23 puffin mysqld: 130523 12:26:22 [Warning] Aborted connection 1102584 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:26:32 puffin mysqld: 130523 12:26:32 [Warning] Aborted connection 1102585 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:27:07 puffin mysqld: 130523 12:27:07 [Warning] Aborted connection 1102587 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:27:36 puffin mysqld: 130523 12:27:35 [Warning] Aborted connection 1102591 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:27:47 puffin mysqld: 130523 12:27:46 [Warning] Aborted connection 1102590 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:27:47 puffin mysqld: 130523 12:27:47 [Warning] Aborted connection 1102592 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:13 puffin mysqld: 130523 12:42:13 [Warning] Aborted connection 1102608 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:13 puffin mysqld: 130523 12:42:13 [Warning] Aborted connection 1102609 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102610 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102622 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102617 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102618 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102582 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:25 puffin mysqld: 130523 12:42:25 [Warning] Aborted connection 1102595 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:35 puffin mysqld: 130523 12:42:35 [Warning] Aborted connection 1102569 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:35 puffin mysqld: 130523 12:42:35 [Warning] Aborted connection 1102568 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102614 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102625 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102624 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:36 puffin mysqld: 130523 12:42:36 [Warning] Aborted connection 1102627 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102620 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102613 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102621 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102611 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102623 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:37 puffin mysqld: 130523 12:42:37 [Warning] Aborted connection 1102612 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 12:42:43 puffin mysqld: 130523 12:42:43 [Warning] Aborted connection 1102576 to db: 'masterpuffinwe_0' user: 'masterpuffinwe_0' host: 'localhost' (Unknown error)
May 23 12:48:12 puffin mysqld: 130523 12:48:12 [Warning] Aborted connection 1104718 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 12:48:12 puffin mysqld: 130523 12:48:12 [Warning] Aborted connection 1104714 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 12:49:12 puffin mysqld: 130523 12:49:12 [Warning] Aborted connection 1104760 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 12:49:12 puffin mysqld: 130523 12:49:12 [Warning] Aborted connection 1104756 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 12:52:12 puffin mysqld: 130523 12:52:12 [Warning] Aborted connection 1104879 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 12:52:12 puffin mysqld: 130523 12:52:12 [Warning] Aborted connection 1104877 to db: 'tnpuffinwebarchn' user: 'tnpuffinwebarchn' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132391 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132382 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132380 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132383 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132373 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132378 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132372 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132405 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132384 to db: 'newstransitionne' user: 'newstransitionne' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132399 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)
May 23 17:03:12 puffin mysqld: 130523 17:03:12 [Warning] Aborted connection 1132374 to db: 'transitionnetwor' user: 'transitionnetwor' host: 'localhost' (Unknown error)

There is nothing I can see in the nginx logs and the php-fpm logs get clobbered by a BOA cron job so they have no data.

I'm afraid don't have a conclusion to this other than there was a load spike that took the server down for around 50 mins.

comment:4 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.4 to 0.9

Some things found following searches for other people having problems like this follow, the searches done were ones like this https://www.google.co.uk/search?q=task+blocked+for+more+than+120+seconds

This is a know bug. By default Linux uses up to 40% of the available memory for
file system caching. After this mark has been reached the file system flushes
all outstanding data to disk causing all following IOs going synchronous. For
flushing out this data to disk this there is a time limit of 120 seconds by
default. In the case here the IO subsystem is not fast enough to flush the data
withing 120 seconds. This especially happens on systems with a lof of memory.

The problem is solved in later kernels and there is not “fix” from Oracle. I
fixed this by lowering the mark for flushing the cache from 40% to 10% by
setting “vm.dirty_ratio=10″ in /etc/sysctl.conf. This setting does not
influence overall database performance since you hopefully use Direct IO and
bypass the file system cache completely.

http://blog.ronnyegner-consulting.de/2011/10/13/info-task-blocked-for-more-than-120-seconds/

And related:

what is the output of

# cat /proc/sys/vm/dirty_ratio

and

# free -mt

default on debian is 10: 10% of memory for caching

It is a known bug about data caching into memory, when the dirty ratio is too
high, IO process can take a huge time (up to 120 sec) to flush cached memory
to disk. You can try to decrease this dirty_ratio if you have a lot of memory

To change this value for testing:

echo 10 > /proc/sys/vm/dirty_ratio

To change permanently, add this to your /etc/sysctl.conf:

vm.dirty_ratio=10

http://serverfault.com/questions/500222/kernel-3-8-apache2-with-wsgi-info-task-apache2-blocked-for-more-than-120-sec

We have:

cat /proc/sys/vm/dirty_ratio
  20

free -mt
               total       used       free     shared    buffers     cached
  Mem:          4079       3736        342          0        311       1883
  -/+ buffers/cache:       1541       2538
  Swap:         1023          0       1023
  Total:        5103       3737       1366

For reference:

dirty_ratio

Contains, as a percentage of total system memory, the number of pages at which
a process which is generating disk writes will itself start writing out dirty
data.

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

There are a couple of old debian bugs that appears to be for the same issue:

And there is a Redhat article you have to pay for which says is has the answer for Redhat servers:

Looking at the /var/log/kern.log the same thing happened a few days ago:

May 22 01:21:43 puffin kernel: [525602.828162] INFO: task mysqld:5746 blocked for more than 120 seconds.
May 22 01:21:43 puffin kernel: [525602.828197] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 01:21:43 puffin kernel: [525602.828218] mysqld        D ffff880003c78c40     0  5746   1734 0x00000080
May 22 01:21:43 puffin kernel: [525602.828244]  ffff8800ffa6b880 0000000000000286 ffff8800fd311e38 ffff8800fd311e34
May 22 01:21:43 puffin kernel: [525602.828273]  ffff8800dccc0b68 0000000000000001 000000000000f9e0 ffff8800fd311fd8
May 22 01:21:43 puffin kernel: [525602.828299]  00000000000157c0 00000000000157c0 ffff8800fc687810 ffff8800fc687b08
May 22 01:21:43 puffin kernel: [525602.828325] Call Trace:
May 22 01:21:43 puffin kernel: [525602.828367]  [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd]
May 22 01:21:43 puffin kernel: [525602.828393]  [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e
May 22 01:21:43 puffin kernel: [525602.828418]  [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe
May 22 01:21:43 puffin kernel: [525602.828478]  [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3]
May 22 01:21:43 puffin kernel: [525602.828504]  [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e
May 22 01:21:43 puffin kernel: [525602.828525]  [<ffffffff8110d77e>] ? do_fsync+0x28/0x39
May 22 01:21:43 puffin kernel: [525602.828547]  [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10
May 22 01:21:43 puffin kernel: [525602.828571]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 22 01:21:43 puffin kernel: [525602.828596] INFO: task mysqld:7330 blocked for more than 120 seconds.
May 22 01:21:43 puffin kernel: [525602.828615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 01:21:43 puffin kernel: [525602.828635] mysqld        D 0000000000000000     0  7330   1734 0x00000080
May 22 01:21:43 puffin kernel: [525602.828658]  ffffffff814a31f0 0000000000000286 0000000000000000 0000000000004000
May 22 01:21:43 puffin kernel: [525602.828689]  ffff8800fc803e00 ffff8800fe647f48 000000000000f9e0 ffff8800fe647fd8
May 22 01:21:43 puffin kernel: [525602.828757]  00000000000157c0 00000000000157c0 ffff8800fd357100 ffff8800fd3573f8
May 22 01:21:43 puffin kernel: [525602.828794] Call Trace:
May 22 01:21:43 puffin kernel: [525602.828819]  [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd]
May 22 01:21:43 puffin kernel: [525602.828842]  [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e
May 22 01:21:43 puffin kernel: [525602.828868]  [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe
May 22 01:21:43 puffin kernel: [525602.828899]  [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3]
May 22 01:21:43 puffin kernel: [525602.828923]  [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e
May 22 01:21:43 puffin kernel: [525602.828947]  [<ffffffff8110d77e>] ? do_fsync+0x28/0x39
May 22 01:21:43 puffin kernel: [525602.828966]  [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10
May 22 01:21:43 puffin kernel: [525602.828988]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
May 22 01:21:43 puffin kernel: [525602.829027] INFO: task redis-server:51205 blocked for more than 120 seconds.
May 22 01:21:43 puffin kernel: [525602.829046] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 01:21:43 puffin kernel: [525602.829069] redis-server  D 0000000000000000     0 51205  61000 0x00000080
May 22 01:21:43 puffin kernel: [525602.829109]  ffffffff814a31f0 0000000000000282 0000000000000000 000000010000000e
May 22 01:21:43 puffin kernel: [525602.829150]  ffff880089aed1c0 ffff8800652d7e18 000000000000f9e0 ffff8800652d7fd8
May 22 01:21:43 puffin kernel: [525602.829189]  00000000000157c0 00000000000157c0 ffff880001e52a60 ffff880001e52d58
May 22 01:21:43 puffin kernel: [525602.829225] Call Trace:
May 22 01:21:43 puffin kernel: [525602.829245]  [<ffffffff810b61d5>] ? wait_on_page_writeback_range+0xce/0x11b
May 22 01:21:43 puffin kernel: [525602.829272]  [<ffffffffa0038e20>] ? log_wait_commit+0xbf/0x112 [jbd]
May 22 01:21:43 puffin kernel: [525602.829292]  [<ffffffff81066326>] ? autoremove_wake_function+0x0/0x2e
May 22 01:21:43 puffin kernel: [525602.829313]  [<ffffffff8130f0f2>] ? _spin_unlock_irqrestore+0xd/0xe
May 22 01:21:43 puffin kernel: [525602.829339]  [<ffffffffa00493f8>] ? ext3_sync_file+0x94/0xc8 [ext3]
May 22 01:21:43 puffin kernel: [525602.829359]  [<ffffffff8110d6d4>] ? vfs_fsync_range+0x73/0x9e
May 22 01:21:43 puffin kernel: [525602.829377]  [<ffffffff8110d77e>] ? do_fsync+0x28/0x39
May 22 01:21:43 puffin kernel: [525602.829395]  [<ffffffff8110d7ad>] ? sys_fsync+0xb/0x10
May 22 01:21:43 puffin kernel: [525602.829412]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

I have run this, as suggested above, to see if it helps:

echo 10 > /proc/sys/vm/dirty_ratio

cat /proc/sys/vm/dirty_ratio
10

comment:5 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.1
  • Status changed from accepted to closed
  • Resolution set to fixed
  • Total Hours changed from 0.9 to 1.0

We are no longer getting such massive load spikes and we also have changed the kill thresholds, see wiki:PuffinServer#LoadSpikes so now closing this ticket.

Note: See TracTickets for help on using tickets.