Ticket #674 (closed maintenance: fixed)

Opened 3 years ago

Last modified 3 years ago

Puffin locked up

Reported by: chris Owned by: chris
Priority: blocker Milestone: Maintenance
Component: Live server Keywords:
Cc: sam, ed, jim Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 0.91

Description

This is on the console, no response via web or ssh :

[4478428.894563]  [<ffffffff810f3634>] ? sys_readlinkat+0x25/0x8d
[4478428.894571]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[4478428.894581] INFO: task cron:46009 blocked for more than 120 seconds.
[4478428.894587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[4478428.894595] cron          D ffff8801fb270e20     0 46009   8829 0x00000080
[4478428.894606]  ffff8801fb270e20 0000000000000286 0000000000000001 00000001810423b0
[4478428.894617]  ffff88000bcf87c0 000000000000000b 000000000000f9e0 ffff8800cbb9dfd8
[4478428.894628]  00000000000157c0 00000000000157c0 ffff8801fa458000 ffff8801fa4582f8
[4478428.894640] Call Trace:
[4478428.894648]  [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449
[4478428.894656]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
[4478428.894666]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
[4478428.894673]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
[4478428.894681]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
[4478428.894689]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
[4478428.894697]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
[4478428.894705]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
[4478428.894713]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
[4478428.894721]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
[4478428.894729]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
[4478428.894738]  [<ffffffff8130dda8>] ? thread_return+0x79/0xe0
[4478428.894746]  [<ffffffff811901fb>] ? _atomic_dec_and_lock+0x33/0x50
[4478428.894755]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
[4478428.894763]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
[4478428.894770]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[4478669.197427] INFO: task vnstatd:1920 blocked for more than 120 seconds.
[4478669.197448] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[4478669.197458] vnstatd       D 0000000000000000     0  1920      1 0x00000000
[4478669.197469]  ffff8801ff18bf90 0000000000000282 ffff8801fa651e10 ffff880000009680
[4478669.197482]  0000000000000000 000000000000000a 000000000000f9e0 ffff8801fa651fd8
[4478669.197494]  00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8
[4478669.197505] Call Trace:
[4478669.197520]  [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60
[4478669.197531]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
[4478669.197542]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
[4478669.197551]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
[4478669.197561]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
[4478669.197569]  [<ffffffff810f86aa>] ? __link_path_walk+0x323/0x6f5
[4478669.197577]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
[4478669.197585]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
[4478669.197593]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
[4478669.197602]  [<ffffffff8130f054>] ? _spin_lock_irqsave+0x15/0x34
[4478669.197612]  [<ffffffff81068fa6>] ? hrtimer_try_to_cancel+0x3a/0x43
[4478669.197622]  [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b
[4478669.197631]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
[4478669.197641]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
[4478669.197649]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

Rebooting...

Change History

comment:1 Changed 3 years ago by chris

It's recovered.

comment:2 Changed 3 years ago by chris

But the reboot is happening anyway...

comment:3 Changed 3 years ago by chris

Booting puffin takes ages due to the massive number of firewall rules which need to be loaded.

It's back up now.

comment:4 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Total Hours changed from 0.0 to 0.25

This looks like the cause, from /var/log/messages:

Jan 14 09:44:25 puffin kernel: [4478428.647813] vnstatd       D ffff8800de4ebf90     0  1920      1 0x00000000
Jan 14 09:44:25 puffin kernel: [4478428.647825]  ffff8800de4ebf90 0000000000000282 ffff8801ffc15000 ffff880000009680
Jan 14 09:44:25 puffin kernel: [4478428.647838]  ffff8801fa651d58 000000000000000a 000000000000f9e0 ffff8801fa651fd8
Jan 14 09:44:25 puffin kernel: [4478428.647849]  00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8
Jan 14 09:44:25 puffin kernel: [4478428.647861] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.647877]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.647888]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
Jan 14 09:44:25 puffin kernel: [4478428.647897]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.647907]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.647915]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.647924]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.647932]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:25 puffin kernel: [4478428.647940]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.647947]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
Jan 14 09:44:25 puffin kernel: [4478428.647957]  [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b
Jan 14 09:44:25 puffin kernel: [4478428.647967]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.647976]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.647985]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648069] awk           D ffff8801eb1f8e20     0 45972  45967 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648079]  ffff8801eb1f8e20 0000000000000286 0000000000000000 0000000000000000
Jan 14 09:44:25 puffin kernel: [4478428.648090]  0000000000000000 0000000000000000 000000000000f9e0 ffff8800bd109fd8
Jan 14 09:44:25 puffin kernel: [4478428.648102]  00000000000157c0 00000000000157c0 ffff8801fa845bd0 ffff8801fa845ec8
Jan 14 09:44:25 puffin kernel: [4478428.648113] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.648121]  [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449
Jan 14 09:44:25 puffin kernel: [4478428.648130]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.648138]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.648146]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.648154]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648162]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
Jan 14 09:44:25 puffin kernel: [4478428.648171]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648179]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.648187]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:25 puffin kernel: [4478428.648195]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.648202]  [<ffffffff8100c4e8>] ? pte_pfn_to_mfn+0x21/0x30
Jan 14 09:44:25 puffin kernel: [4478428.648211]  [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83
Jan 14 09:44:25 puffin kernel: [4478428.648219]  [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2
Jan 14 09:44:25 puffin kernel: [4478428.648229]  [<ffffffff8105cd8e>] ? do_sigaction+0x159/0x171
Jan 14 09:44:25 puffin kernel: [4478428.648236]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.648244]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.648252]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648275] vmstat        D ffff8801c73c7810     0 45976      1 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648284]  ffff8801c73c7810 0000000000000286 ffff8800d95e5e10 ffff880000009680
Jan 14 09:44:25 puffin kernel: [4478428.648303]  00000000000006fe 000000000000000a 000000000000f9e0 ffff8800d95e5fd8
Jan 14 09:44:25 puffin kernel: [4478428.648314]  00000000000157c0 00000000000157c0 ffff8801fc68b880 ffff8801fc68bb78
Jan 14 09:44:25 puffin kernel: [4478428.648326] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.648333]  [<ffffffff810414a4>] ? set_next_entity+0x34/0x56
Jan 14 09:44:25 puffin kernel: [4478428.648342]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.648350]  [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2
Jan 14 09:44:25 puffin kernel: [4478428.648359]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.648366]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.648373]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648382]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.648390]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:25 puffin kernel: [4478428.648404]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.648404]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.648404]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.648404]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648417] id            D ffff8801eb1f8e20     0 45979  45975 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648427]  ffff8801eb1f8e20 0000000000000286 ffff8801ff0c7d80 0000000000000001
Jan 14 09:44:25 puffin kernel: [4478428.648439]  0000000000000000 0000000000000098 000000000000f9e0 ffff880029119fd8
Jan 14 09:44:25 puffin kernel: [4478428.648449]  00000000000157c0 00000000000157c0 ffff8800de4eb880 ffff8800de4ebb78
Jan 14 09:44:25 puffin kernel: [4478428.648461] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.648469]  [<ffffffff810b9b8b>] ? zone_watermark_ok+0x20/0xb1
Jan 14 09:44:25 puffin kernel: [4478428.648477]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.648486]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.648493]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.648501]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648509]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.648516]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:25 puffin kernel: [4478428.648524]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.648534]  [<ffffffff810d217e>] ? vma_link+0x74/0x9a
Jan 14 09:44:25 puffin kernel: [4478428.648541]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.648549]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.648556]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648580] sshd          D ffff8801fa8446a0     0 45990   6977 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648590]  ffff8801fa8446a0 0000000000000282 000000000100007f 0000000000000000
Jan 14 09:44:25 puffin kernel: [4478428.648601]  0100007f0002bcee 0000000000000000 000000000000f9e0 ffff8800dd239fd8
Jan 14 09:44:25 puffin kernel: [4478428.648613]  00000000000157c0 00000000000157c0 ffff8801fc68d4c0 ffff8801fc68d7b8
Jan 14 09:44:25 puffin kernel: [4478428.648623] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.648629]  [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449
Jan 14 09:44:25 puffin kernel: [4478428.648650]  [<ffffffffa01393c6>] ? death_by_timeout+0x0/0x4a [nf_conntrack]
Jan 14 09:44:25 puffin kernel: [4478428.648660]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.648669]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.648677]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.648684]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648692]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
Jan 14 09:44:25 puffin kernel: [4478428.648701]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648708]  [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83
Jan 14 09:44:25 puffin kernel: [4478428.648716]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.648724]  [<ffffffff810fa318>] ? do_filp_open+0x15d/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.648732]  [<ffffffff8130f4c5>] ? page_fault+0x25/0x30
Jan 14 09:44:25 puffin kernel: [4478428.648739]  [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60
Jan 14 09:44:25 puffin kernel: [4478428.648747]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
Jan 14 09:44:25 puffin kernel: [4478428.648756]  [<ffffffff8119845c>] ? strncpy_from_user+0x51/0x6d
Jan 14 09:44:25 puffin kernel: [4478428.648765]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.648772]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.648779]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648801] ps            D ffff8801fa844db0     0 45994  45948 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648811]  ffff8801fa844db0 0000000000000282 ffff8801ff0c7480 ffffffff810bb263
Jan 14 09:44:25 puffin kernel: [4478428.648822]  ffff88000000a6d0 ffff880014ffa000 000000000000f9e0 ffff880014ffbfd8
Jan 14 09:44:25 puffin kernel: [4478428.648833]  00000000000157c0 00000000000157c0 ffff8801faab8710 ffff8801faab8a08
Jan 14 09:44:25 puffin kernel: [4478428.648844] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.648850]  [<ffffffff810bb263>] ? get_page_from_freelist+0x56b/0x760
Jan 14 09:44:25 puffin kernel: [4478428.648858]  [<ffffffff810b9b8b>] ? zone_watermark_ok+0x20/0xb1
Jan 14 09:44:25 puffin kernel: [4478428.648867]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:25 puffin kernel: [4478428.648875]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:25 puffin kernel: [4478428.648882]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:25 puffin kernel: [4478428.648890]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:25 puffin kernel: [4478428.648898]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:25 puffin kernel: [4478428.648905]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:25 puffin kernel: [4478428.648913]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:25 puffin kernel: [4478428.648920]  [<ffffffff8130f4c5>] ? page_fault+0x25/0x30
Jan 14 09:44:25 puffin kernel: [4478428.648927]  [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60
Jan 14 09:44:25 puffin kernel: [4478428.648934]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
Jan 14 09:44:25 puffin kernel: [4478428.648941]  [<ffffffff8119845c>] ? strncpy_from_user+0x51/0x6d
Jan 14 09:44:25 puffin kernel: [4478428.648949]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:25 puffin kernel: [4478428.648956]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:25 puffin kernel: [4478428.648964]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:25 puffin kernel: [4478428.648987] awk           D ffff8801eb1fe2e0     0 46000  45997 0x00000080
Jan 14 09:44:25 puffin kernel: [4478428.648996]  ffff8801eb1fe2e0 0000000000000286 00000002000280da ffff88000000da08
Jan 14 09:44:25 puffin kernel: [4478428.649008]  0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800164d9fd8
Jan 14 09:44:25 puffin kernel: [4478428.649019]  00000000000157c0 00000000000157c0 ffff8801c73c3170 ffff8801c73c3468
Jan 14 09:44:25 puffin kernel: [4478428.649031] Call Trace:
Jan 14 09:44:25 puffin kernel: [4478428.649037]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
Jan 14 09:44:26 puffin kernel: [4478428.894222]  [<ffffffff81155196>] ? cap_inode_permission+0x0/0x3
Jan 14 09:44:26 puffin kernel: [4478428.894237]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:26 puffin kernel: [4478428.894248]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:26 puffin kernel: [4478428.894258]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:26 puffin kernel: [4478428.894265]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894275]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
Jan 14 09:44:26 puffin kernel: [4478428.894284]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894292]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:26 puffin kernel: [4478428.894300]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:26 puffin kernel: [4478428.894309]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:26 puffin kernel: [4478428.894319]  [<ffffffff8100c4e8>] ? pte_pfn_to_mfn+0x21/0x30
Jan 14 09:44:26 puffin kernel: [4478428.894327]  [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83
Jan 14 09:44:26 puffin kernel: [4478428.894337]  [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2
Jan 14 09:44:26 puffin kernel: [4478428.894347]  [<ffffffff8105cd8e>] ? do_sigaction+0x159/0x171
Jan 14 09:44:26 puffin kernel: [4478428.894356]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:26 puffin kernel: [4478428.894365]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:26 puffin kernel: [4478428.894374]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:26 puffin kernel: [4478428.894399] perl          D ffff8800de4ee2e0     0 46003  45996 0x00000080
Jan 14 09:44:26 puffin kernel: [4478428.894410]  ffff8800de4ee2e0 0000000000000286 0000000000000041 0000000000000000
Jan 14 09:44:26 puffin kernel: [4478428.894422]  00000000000006fe ffffffff81012cdb 000000000000f9e0 ffff880016527fd8
Jan 14 09:44:26 puffin kernel: [4478428.894434]  00000000000157c0 00000000000157c0 ffff8800161d2350 ffff8800161d2648
Jan 14 09:44:26 puffin kernel: [4478428.894445] Call Trace:
Jan 14 09:44:26 puffin kernel: [4478428.894454]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
Jan 14 09:44:26 puffin kernel: [4478428.894464]  [<ffffffff81155196>] ? cap_inode_permission+0x0/0x3
Jan 14 09:44:26 puffin kernel: [4478428.894472]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:26 puffin kernel: [4478428.894481]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:26 puffin kernel: [4478428.894488]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:26 puffin kernel: [4478428.894496]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894504]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
Jan 14 09:44:26 puffin kernel: [4478428.894512]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894521]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:26 puffin kernel: [4478428.894528]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:26 puffin kernel: [4478428.894536]  [<ffffffff810fb5f6>] ? user_path_at+0x48/0x79
Jan 14 09:44:26 puffin kernel: [4478428.894546]  [<ffffffff81099076>] ? check_for_new_grace_period+0x98/0xa5
Jan 14 09:44:26 puffin kernel: [4478428.894556]  [<ffffffff81311626>] ? do_page_fault+0x2e0/0x2fc
Jan 14 09:44:26 puffin kernel: [4478428.894563]  [<ffffffff810f3634>] ? sys_readlinkat+0x25/0x8d
Jan 14 09:44:26 puffin kernel: [4478428.894571]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:44:26 puffin kernel: [4478428.894595] cron          D ffff8801fb270e20     0 46009   8829 0x00000080
Jan 14 09:44:26 puffin kernel: [4478428.894606]  ffff8801fb270e20 0000000000000286 0000000000000001 00000001810423b0
Jan 14 09:44:26 puffin kernel: [4478428.894617]  ffff88000bcf87c0 000000000000000b 000000000000f9e0 ffff8800cbb9dfd8
Jan 14 09:44:26 puffin kernel: [4478428.894628]  00000000000157c0 00000000000157c0 ffff8801fa458000 ffff8801fa4582f8
Jan 14 09:44:26 puffin kernel: [4478428.894640] Call Trace:
Jan 14 09:44:26 puffin kernel: [4478428.894648]  [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449
Jan 14 09:44:26 puffin kernel: [4478428.894656]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:44:26 puffin kernel: [4478428.894666]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:44:26 puffin kernel: [4478428.894673]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:44:26 puffin kernel: [4478428.894681]  [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894689]  [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317
Jan 14 09:44:26 puffin kernel: [4478428.894697]  [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5
Jan 14 09:44:26 puffin kernel: [4478428.894705]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:44:26 puffin kernel: [4478428.894713]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:44:26 puffin kernel: [4478428.894721]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:44:26 puffin kernel: [4478428.894729]  [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20
Jan 14 09:44:26 puffin kernel: [4478428.894738]  [<ffffffff8130dda8>] ? thread_return+0x79/0xe0
Jan 14 09:44:26 puffin kernel: [4478428.894746]  [<ffffffff811901fb>] ? _atomic_dec_and_lock+0x33/0x50
Jan 14 09:44:26 puffin kernel: [4478428.894755]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:44:26 puffin kernel: [4478428.894763]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:44:26 puffin kernel: [4478428.894770]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Jan 14 09:48:22 puffin pure-ftpd: (?@192.95.55.77) [INFO] New connection from 192.95.55.77
Jan 14 09:48:26 puffin kernel: [4478669.197458] vnstatd       D 0000000000000000     0  1920      1 0x00000000
Jan 14 09:48:26 puffin kernel: [4478669.197469]  ffff8801ff18bf90 0000000000000282 ffff8801fa651e10 ffff880000009680
Jan 14 09:48:26 puffin kernel: [4478669.197482]  0000000000000000 000000000000000a 000000000000f9e0 ffff8801fa651fd8
Jan 14 09:48:26 puffin kernel: [4478669.197494]  00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8
Jan 14 09:48:26 puffin kernel: [4478669.197505] Call Trace:
Jan 14 09:48:26 puffin kernel: [4478669.197520]  [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60
Jan 14 09:48:26 puffin kernel: [4478669.197531]  [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6
Jan 14 09:48:26 puffin kernel: [4478669.197542]  [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192
Jan 14 09:48:26 puffin kernel: [4478669.197551]  [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31
Jan 14 09:48:26 puffin kernel: [4478669.197561]  [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d
Jan 14 09:48:26 puffin kernel: [4478669.197569]  [<ffffffff810f86aa>] ? __link_path_walk+0x323/0x6f5
Jan 14 09:48:26 puffin kernel: [4478669.197577]  [<ffffffff810f8caa>] ? path_walk+0x66/0xc9
Jan 14 09:48:26 puffin kernel: [4478669.197585]  [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77
Jan 14 09:48:26 puffin kernel: [4478669.197593]  [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b
Jan 14 09:48:26 puffin kernel: [4478669.197602]  [<ffffffff8130f054>] ? _spin_lock_irqsave+0x15/0x34
Jan 14 09:48:26 puffin kernel: [4478669.197612]  [<ffffffff81068fa6>] ? hrtimer_try_to_cancel+0x3a/0x43
Jan 14 09:48:26 puffin kernel: [4478669.197622]  [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b
Jan 14 09:48:26 puffin kernel: [4478669.197631]  [<ffffffff81103705>] ? alloc_fd+0x67/0x10c
Jan 14 09:48:26 puffin kernel: [4478669.197641]  [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc
Jan 14 09:48:26 puffin kernel: [4478669.197649]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b

Things I noticed during the boot:

  • We have lots of attempts to connect to the FTP server, pure-gftp, but none of our developers use it, can we uninstall it or does BOA require it?
  • We have collectd installed but not configured, can we uninstall it or does BOA require it?

Counting failed FTP login attempts from /var/log/messages:

cat messages | grep pure-ftp | grep "New connection from" | wc -l
583
cat messages.1 | grep pure-ftp | grep "New connection from" | wc -l
2367
zcat messages.*.gz | grep pure-ftp | grep "New connection from" | wc -l
7203

That is over 10k FTP login attempts in the last month.

Version 0, edited 3 years ago by chris (next)

comment:5 follow-up: ↓ 6 Changed 3 years ago by jim

MySQL is offline -- are you still running updates?

comment:6 in reply to: ↑ 5 Changed 3 years ago by chris

Replying to jim:

MySQL is offline -- are you still running updates?

Crap, I assumed it would start on a reboot, I have tried starting it:

/etc/init.d/mysql restart
[ ok ] Stopping MariaDB database server: mysqld.
[FAIL] Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!
/etc/init.d/mysql start
[FAIL] Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!

Gonna look at the logs now...

comment:7 Changed 3 years ago by chris

Fixed:

chown mysql:mysql /run/shm/mysql
[ ok ] Starting MariaDB database server: mysqld . . . ..
[info] Checking for corrupt, not cleanly closed and upgrade needing tables..

I don't know why that was needed, this is in my crontab:

crontab -e -u chris

@reboot sudo mkdir /run/shm/mysql
@reboot sudo chown mysql:mysql /run/shm/mysql

It's in my crontab as the root one is clobbered by BOA updates.

comment:8 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.5
  • Total Hours changed from 0.25 to 0.75

I have removed collectd-core:

dpkg -r collectd-core
(Reading database ... 58041 files and directories currently installed.)
Removing collectd-core ...
[ ok ] Stopping statistics collection and monitoring daemon: collectd.
Processing triggers for man-db ...

But there is an issue here:

aptitude search collectd
...
pB  collectd                                                  - statistics collection and monitoring daemon                        
c   collectd-core                                             - statistics collection and monitoring daemon (core system)   
...

The flags mean:

State:

c - removed, but config-files still present (ie, not purged)

Action:

B - Broken
p - purge

So:

aptitude purge collectd
No packages will be installed, upgraded, or removed.
0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 0 B of archives. After unpacking 0 B will be used.

aptitude purge collectd-core
The following packages will be REMOVED:  
  collectd-core{p} 
0 packages upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
Need to get 0 B of archives. After unpacking 0 B will be used.
Do you want to continue? [Y/n/?] y
(Reading database ... 57834 files and directories currently installed.)
Removing collectd-core ...
Purging configuration files for collectd-core ...
/var/lib/dpkg/info/collectd-core.postrm: line 23: db_input: command not found
/var/lib/dpkg/info/collectd-core.postrm: line 24: db_go: command not found
/var/lib/dpkg/info/collectd-core.postrm: line 25: db_get: command not found
/var/lib/dpkg/info/collectd-core.postrm: line 23: db_input: command not found
/var/lib/dpkg/info/collectd-core.postrm: line 24: db_go: command not found
/var/lib/dpkg/info/collectd-core.postrm: line 25: db_get: command not found
dpkg: warning: while removing collectd-core, directory '/var/lib/collectd' not empty so not removed

And now it's gone:

aptitude search collectd
p   collectd                                                                   - statistics collection and monitoring daemon                                          
p   collectd-core                                                              - statistics collection and monitoring daemon (core system)                            
p   collectd-dbg                                                               - statistics collection and monitoring daemon (debugging symbols)                      
p   collectd-dev                                                               - statistics collection and monitoring daemon (development files)                      
p   collectd-utils                                                             - statistics collection and monitoring daemon (utilities)                              
p   kcollectd                                                                  - simple collectd graphing frontend for KDE                                            
p   libcollectdclient-dev                                                      - client library for collectd's control interface (development files)                  
p   libcollectdclient0                                                         - client library for collectd's control interface                         

Regarding pure-ftp, it looks like it is installed from source by BOA, it's not listed as installed by aptitude:

aptitude search pure-ftp
p   mysqmail-pure-ftpd-logger                                                  - real-time logging system in MySQL - Pure-FTPd traffic-logger                         
p   pure-ftpd                                                                  - Secure and efficient FTP server                                                      
p   pure-ftpd-common                                                           - Pure-FTPd FTP server (Common Files)                                                  
p   pure-ftpd-ldap                                                             - Secure and efficient FTP server with LDAP user authentication                        
p   pure-ftpd-mysql                                                            - Secure and efficient FTP server with MySQL user authentication                       
p   pure-ftpd-postgresql                                                       - Secure and efficient FTP server with PostgreSQL user authentication         

In terms of the kernel lockup/crash that caused the server to be unresponsive, I don't know what to suggest other than keeping an eye out for things like this in the future.

comment:9 Changed 3 years ago by chris

For reference this is what I found in /var/log/syslog which alerted me to the reason why MySQL wouldn't start:

Jan 14 10:50:34 puffin mysqld: 140114 10:50:34 [ERROR] mysqld: Can't create/write to file '/run/shm/mysql/ibw6lZ2z' (Errcode: 13)
Jan 14 10:50:34 puffin mysqld: 140114 10:50:34  InnoDB: Error: unable to create temporary file; errno: 13

comment:10 Changed 3 years ago by chris

This is the first email alert I got around the time of the kernel crash:

From: root@puffin.webarch.net
Date: Tue, 14 Jan 2014 09:44:08 +0000 (GMT)
To: chris@webarchitects.co.uk
Subject: lfd on puffin.webarch.net: High 5 minute load average alert - 8.79

[-- Attachment #1 --]
[-- Type: text/plain, Encoding: 7bit, Size: 0.2K --]

Time:                    Tue Jan 14 09:42:47 2014 +0000
1 Min Load Avg:          29.58
5 Min Load Avg:          8.79
15 Min Load Avg:         3.33
Running/Total Processes: 44/414

[-- Attachment #2: ps.txt --]
[-- Type: text/plain, Encoding: 7bit, Size: 0.1K --]

Output from ps:
Unable to obtain process output within 15 seconds - Timed out

[-- Attachment #3: vmstat.txt --]
[-- Type: text/plain, Encoding: 7bit, Size: 0.1K --]

Output from vmstat:
Unable to obtain vmstat output within 10 seconds - Timed out

[-- Attachment #4: apachestatus.html --]
[-- Type: text/html, Encoding: 7bit, Size: 0.1K --]

   Unable to retrieve Apache Server Status [http://127.0.0.1/server-status] -
   Unable to download: 404 - Not Found

comment:11 Changed 3 years ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.16
  • Total Hours changed from 0.75 to 0.91

Perhaps the items in my crontab were run in the wrong order -- chowning a directory before it was created, so I have updated it to this:

@reboot sudo mkdir /run/shm/mysql ; sudo chown mysql:mysql /run/shm/mysql

And I have documented the crontab here: wiki:PuffinServer#Cron

comment:12 Changed 3 years ago by sam

Hi Chris

transitionnetwork.org is currently unavailable. I created a new ticket, but maybe it should just be a comment here?

Not responding to a ping at the moment.

Could you have a look?

Thanks

Sam

comment:13 Changed 3 years ago by chris

  • Status changed from new to closed
  • Resolution set to fixed

Closing as this is resolved.

Note: See TracTickets for help on using tickets.