Ticket #674 (closed maintenance: fixed)
Puffin locked up
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | blocker | Milestone: | Maintenance |
Component: | Live server | Keywords: | |
Cc: | sam, ed, jim | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 0.91 |
Description
This is on the console, no response via web or ssh :
[4478428.894563] [<ffffffff810f3634>] ? sys_readlinkat+0x25/0x8d [4478428.894571] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b [4478428.894581] INFO: task cron:46009 blocked for more than 120 seconds. [4478428.894587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4478428.894595] cron D ffff8801fb270e20 0 46009 8829 0x00000080 [4478428.894606] ffff8801fb270e20 0000000000000286 0000000000000001 00000001810423b0 [4478428.894617] ffff88000bcf87c0 000000000000000b 000000000000f9e0 ffff8800cbb9dfd8 [4478428.894628] 00000000000157c0 00000000000157c0 ffff8801fa458000 ffff8801fa4582f8 [4478428.894640] Call Trace: [4478428.894648] [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449 [4478428.894656] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 [4478428.894666] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 [4478428.894673] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d [4478428.894681] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 [4478428.894689] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 [4478428.894697] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 [4478428.894705] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 [4478428.894713] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 [4478428.894721] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b [4478428.894729] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 [4478428.894738] [<ffffffff8130dda8>] ? thread_return+0x79/0xe0 [4478428.894746] [<ffffffff811901fb>] ? _atomic_dec_and_lock+0x33/0x50 [4478428.894755] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c [4478428.894763] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc [4478428.894770] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b [4478669.197427] INFO: task vnstatd:1920 blocked for more than 120 seconds. [4478669.197448] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4478669.197458] vnstatd D 0000000000000000 0 1920 1 0x00000000 [4478669.197469] ffff8801ff18bf90 0000000000000282 ffff8801fa651e10 ffff880000009680 [4478669.197482] 0000000000000000 000000000000000a 000000000000f9e0 ffff8801fa651fd8 [4478669.197494] 00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8 [4478669.197505] Call Trace: [4478669.197520] [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60 [4478669.197531] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 [4478669.197542] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 [4478669.197551] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 [4478669.197561] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d [4478669.197569] [<ffffffff810f86aa>] ? __link_path_walk+0x323/0x6f5 [4478669.197577] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 [4478669.197585] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 [4478669.197593] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b [4478669.197602] [<ffffffff8130f054>] ? _spin_lock_irqsave+0x15/0x34 [4478669.197612] [<ffffffff81068fa6>] ? hrtimer_try_to_cancel+0x3a/0x43 [4478669.197622] [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b [4478669.197631] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c [4478669.197641] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc [4478669.197649] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Rebooting...
Change History
comment:3 Changed 3 years ago by chris
Booting puffin takes ages due to the massive number of firewall rules which need to be loaded.
It's back up now.
comment:4 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
This looks like the cause, from /var/log/messages:
Jan 14 09:44:25 puffin kernel: [4478428.647813] vnstatd D ffff8800de4ebf90 0 1920 1 0x00000000 Jan 14 09:44:25 puffin kernel: [4478428.647825] ffff8800de4ebf90 0000000000000282 ffff8801ffc15000 ffff880000009680 Jan 14 09:44:25 puffin kernel: [4478428.647838] ffff8801fa651d58 000000000000000a 000000000000f9e0 ffff8801fa651fd8 Jan 14 09:44:25 puffin kernel: [4478428.647849] 00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8 Jan 14 09:44:25 puffin kernel: [4478428.647861] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.647877] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.647888] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 Jan 14 09:44:25 puffin kernel: [4478428.647897] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.647907] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.647915] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.647924] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.647932] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:25 puffin kernel: [4478428.647940] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:25 puffin kernel: [4478428.647947] [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185 Jan 14 09:44:25 puffin kernel: [4478428.647957] [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b Jan 14 09:44:25 puffin kernel: [4478428.647967] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.647976] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.647985] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648069] awk D ffff8801eb1f8e20 0 45972 45967 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648079] ffff8801eb1f8e20 0000000000000286 0000000000000000 0000000000000000 Jan 14 09:44:25 puffin kernel: [4478428.648090] 0000000000000000 0000000000000000 000000000000f9e0 ffff8800bd109fd8 Jan 14 09:44:25 puffin kernel: [4478428.648102] 00000000000157c0 00000000000157c0 ffff8801fa845bd0 ffff8801fa845ec8 Jan 14 09:44:25 puffin kernel: [4478428.648113] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.648121] [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449 Jan 14 09:44:25 puffin kernel: [4478428.648130] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.648138] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.648146] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.648154] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648162] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 Jan 14 09:44:25 puffin kernel: [4478428.648171] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648179] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.648187] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:25 puffin kernel: [4478428.648195] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:25 puffin kernel: [4478428.648202] [<ffffffff8100c4e8>] ? pte_pfn_to_mfn+0x21/0x30 Jan 14 09:44:25 puffin kernel: [4478428.648211] [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83 Jan 14 09:44:25 puffin kernel: [4478428.648219] [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2 Jan 14 09:44:25 puffin kernel: [4478428.648229] [<ffffffff8105cd8e>] ? do_sigaction+0x159/0x171 Jan 14 09:44:25 puffin kernel: [4478428.648236] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.648244] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.648252] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648275] vmstat D ffff8801c73c7810 0 45976 1 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648284] ffff8801c73c7810 0000000000000286 ffff8800d95e5e10 ffff880000009680 Jan 14 09:44:25 puffin kernel: [4478428.648303] 00000000000006fe 000000000000000a 000000000000f9e0 ffff8800d95e5fd8 Jan 14 09:44:25 puffin kernel: [4478428.648314] 00000000000157c0 00000000000157c0 ffff8801fc68b880 ffff8801fc68bb78 Jan 14 09:44:25 puffin kernel: [4478428.648326] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.648333] [<ffffffff810414a4>] ? set_next_entity+0x34/0x56 Jan 14 09:44:25 puffin kernel: [4478428.648342] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.648350] [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2 Jan 14 09:44:25 puffin kernel: [4478428.648359] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.648366] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.648373] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648382] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.648390] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:25 puffin kernel: [4478428.648404] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:25 puffin kernel: [4478428.648404] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.648404] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.648404] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648417] id D ffff8801eb1f8e20 0 45979 45975 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648427] ffff8801eb1f8e20 0000000000000286 ffff8801ff0c7d80 0000000000000001 Jan 14 09:44:25 puffin kernel: [4478428.648439] 0000000000000000 0000000000000098 000000000000f9e0 ffff880029119fd8 Jan 14 09:44:25 puffin kernel: [4478428.648449] 00000000000157c0 00000000000157c0 ffff8800de4eb880 ffff8800de4ebb78 Jan 14 09:44:25 puffin kernel: [4478428.648461] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.648469] [<ffffffff810b9b8b>] ? zone_watermark_ok+0x20/0xb1 Jan 14 09:44:25 puffin kernel: [4478428.648477] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.648486] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.648493] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.648501] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648509] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.648516] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:25 puffin kernel: [4478428.648524] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:25 puffin kernel: [4478428.648534] [<ffffffff810d217e>] ? vma_link+0x74/0x9a Jan 14 09:44:25 puffin kernel: [4478428.648541] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.648549] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.648556] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648580] sshd D ffff8801fa8446a0 0 45990 6977 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648590] ffff8801fa8446a0 0000000000000282 000000000100007f 0000000000000000 Jan 14 09:44:25 puffin kernel: [4478428.648601] 0100007f0002bcee 0000000000000000 000000000000f9e0 ffff8800dd239fd8 Jan 14 09:44:25 puffin kernel: [4478428.648613] 00000000000157c0 00000000000157c0 ffff8801fc68d4c0 ffff8801fc68d7b8 Jan 14 09:44:25 puffin kernel: [4478428.648623] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.648629] [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449 Jan 14 09:44:25 puffin kernel: [4478428.648650] [<ffffffffa01393c6>] ? death_by_timeout+0x0/0x4a [nf_conntrack] Jan 14 09:44:25 puffin kernel: [4478428.648660] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.648669] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.648677] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.648684] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648692] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 Jan 14 09:44:25 puffin kernel: [4478428.648701] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648708] [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83 Jan 14 09:44:25 puffin kernel: [4478428.648716] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.648724] [<ffffffff810fa318>] ? do_filp_open+0x15d/0x94b Jan 14 09:44:25 puffin kernel: [4478428.648732] [<ffffffff8130f4c5>] ? page_fault+0x25/0x30 Jan 14 09:44:25 puffin kernel: [4478428.648739] [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60 Jan 14 09:44:25 puffin kernel: [4478428.648747] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 Jan 14 09:44:25 puffin kernel: [4478428.648756] [<ffffffff8119845c>] ? strncpy_from_user+0x51/0x6d Jan 14 09:44:25 puffin kernel: [4478428.648765] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.648772] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.648779] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648801] ps D ffff8801fa844db0 0 45994 45948 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648811] ffff8801fa844db0 0000000000000282 ffff8801ff0c7480 ffffffff810bb263 Jan 14 09:44:25 puffin kernel: [4478428.648822] ffff88000000a6d0 ffff880014ffa000 000000000000f9e0 ffff880014ffbfd8 Jan 14 09:44:25 puffin kernel: [4478428.648833] 00000000000157c0 00000000000157c0 ffff8801faab8710 ffff8801faab8a08 Jan 14 09:44:25 puffin kernel: [4478428.648844] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.648850] [<ffffffff810bb263>] ? get_page_from_freelist+0x56b/0x760 Jan 14 09:44:25 puffin kernel: [4478428.648858] [<ffffffff810b9b8b>] ? zone_watermark_ok+0x20/0xb1 Jan 14 09:44:25 puffin kernel: [4478428.648867] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:25 puffin kernel: [4478428.648875] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:25 puffin kernel: [4478428.648882] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:25 puffin kernel: [4478428.648890] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:25 puffin kernel: [4478428.648898] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:25 puffin kernel: [4478428.648905] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:25 puffin kernel: [4478428.648913] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:25 puffin kernel: [4478428.648920] [<ffffffff8130f4c5>] ? page_fault+0x25/0x30 Jan 14 09:44:25 puffin kernel: [4478428.648927] [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60 Jan 14 09:44:25 puffin kernel: [4478428.648934] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 Jan 14 09:44:25 puffin kernel: [4478428.648941] [<ffffffff8119845c>] ? strncpy_from_user+0x51/0x6d Jan 14 09:44:25 puffin kernel: [4478428.648949] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:25 puffin kernel: [4478428.648956] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:25 puffin kernel: [4478428.648964] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:25 puffin kernel: [4478428.648987] awk D ffff8801eb1fe2e0 0 46000 45997 0x00000080 Jan 14 09:44:25 puffin kernel: [4478428.648996] ffff8801eb1fe2e0 0000000000000286 00000002000280da ffff88000000da08 Jan 14 09:44:25 puffin kernel: [4478428.649008] 0000000000000000 ffffffff81012cdb 000000000000f9e0 ffff8800164d9fd8 Jan 14 09:44:25 puffin kernel: [4478428.649019] 00000000000157c0 00000000000157c0 ffff8801c73c3170 ffff8801c73c3468 Jan 14 09:44:25 puffin kernel: [4478428.649031] Call Trace: Jan 14 09:44:25 puffin kernel: [4478428.649037] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 Jan 14 09:44:26 puffin kernel: [4478428.894222] [<ffffffff81155196>] ? cap_inode_permission+0x0/0x3 Jan 14 09:44:26 puffin kernel: [4478428.894237] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:26 puffin kernel: [4478428.894248] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:26 puffin kernel: [4478428.894258] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:26 puffin kernel: [4478428.894265] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894275] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 Jan 14 09:44:26 puffin kernel: [4478428.894284] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894292] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:26 puffin kernel: [4478428.894300] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:26 puffin kernel: [4478428.894309] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:26 puffin kernel: [4478428.894319] [<ffffffff8100c4e8>] ? pte_pfn_to_mfn+0x21/0x30 Jan 14 09:44:26 puffin kernel: [4478428.894327] [<ffffffff8100cc43>] ? xen_make_pte+0x7b/0x83 Jan 14 09:44:26 puffin kernel: [4478428.894337] [<ffffffff8100e585>] ? xen_set_pte_at+0xa2/0xc2 Jan 14 09:44:26 puffin kernel: [4478428.894347] [<ffffffff8105cd8e>] ? do_sigaction+0x159/0x171 Jan 14 09:44:26 puffin kernel: [4478428.894356] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:26 puffin kernel: [4478428.894365] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:26 puffin kernel: [4478428.894374] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:26 puffin kernel: [4478428.894399] perl D ffff8800de4ee2e0 0 46003 45996 0x00000080 Jan 14 09:44:26 puffin kernel: [4478428.894410] ffff8800de4ee2e0 0000000000000286 0000000000000041 0000000000000000 Jan 14 09:44:26 puffin kernel: [4478428.894422] 00000000000006fe ffffffff81012cdb 000000000000f9e0 ffff880016527fd8 Jan 14 09:44:26 puffin kernel: [4478428.894434] 00000000000157c0 00000000000157c0 ffff8800161d2350 ffff8800161d2648 Jan 14 09:44:26 puffin kernel: [4478428.894445] Call Trace: Jan 14 09:44:26 puffin kernel: [4478428.894454] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 Jan 14 09:44:26 puffin kernel: [4478428.894464] [<ffffffff81155196>] ? cap_inode_permission+0x0/0x3 Jan 14 09:44:26 puffin kernel: [4478428.894472] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:26 puffin kernel: [4478428.894481] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:26 puffin kernel: [4478428.894488] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:26 puffin kernel: [4478428.894496] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894504] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 Jan 14 09:44:26 puffin kernel: [4478428.894512] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894521] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:26 puffin kernel: [4478428.894528] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:26 puffin kernel: [4478428.894536] [<ffffffff810fb5f6>] ? user_path_at+0x48/0x79 Jan 14 09:44:26 puffin kernel: [4478428.894546] [<ffffffff81099076>] ? check_for_new_grace_period+0x98/0xa5 Jan 14 09:44:26 puffin kernel: [4478428.894556] [<ffffffff81311626>] ? do_page_fault+0x2e0/0x2fc Jan 14 09:44:26 puffin kernel: [4478428.894563] [<ffffffff810f3634>] ? sys_readlinkat+0x25/0x8d Jan 14 09:44:26 puffin kernel: [4478428.894571] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:44:26 puffin kernel: [4478428.894595] cron D ffff8801fb270e20 0 46009 8829 0x00000080 Jan 14 09:44:26 puffin kernel: [4478428.894606] ffff8801fb270e20 0000000000000286 0000000000000001 00000001810423b0 Jan 14 09:44:26 puffin kernel: [4478428.894617] ffff88000bcf87c0 000000000000000b 000000000000f9e0 ffff8800cbb9dfd8 Jan 14 09:44:26 puffin kernel: [4478428.894628] 00000000000157c0 00000000000157c0 ffff8801fa458000 ffff8801fa4582f8 Jan 14 09:44:26 puffin kernel: [4478428.894640] Call Trace: Jan 14 09:44:26 puffin kernel: [4478428.894648] [<ffffffff811976a6>] ? vsnprintf+0x40a/0x449 Jan 14 09:44:26 puffin kernel: [4478428.894656] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:44:26 puffin kernel: [4478428.894666] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:44:26 puffin kernel: [4478428.894673] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:44:26 puffin kernel: [4478428.894681] [<ffffffff810f892c>] ? __link_path_walk+0x5a5/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894689] [<ffffffff810f826a>] ? do_follow_link+0x1fa/0x317 Jan 14 09:44:26 puffin kernel: [4478428.894697] [<ffffffff810f86e0>] ? __link_path_walk+0x359/0x6f5 Jan 14 09:44:26 puffin kernel: [4478428.894705] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:44:26 puffin kernel: [4478428.894713] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:44:26 puffin kernel: [4478428.894721] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:44:26 puffin kernel: [4478428.894729] [<ffffffff81012cdb>] ? xen_hypervisor_callback+0x1b/0x20 Jan 14 09:44:26 puffin kernel: [4478428.894738] [<ffffffff8130dda8>] ? thread_return+0x79/0xe0 Jan 14 09:44:26 puffin kernel: [4478428.894746] [<ffffffff811901fb>] ? _atomic_dec_and_lock+0x33/0x50 Jan 14 09:44:26 puffin kernel: [4478428.894755] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:44:26 puffin kernel: [4478428.894763] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:44:26 puffin kernel: [4478428.894770] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b Jan 14 09:48:22 puffin pure-ftpd: (?@192.95.55.77) [INFO] New connection from 192.95.55.77 Jan 14 09:48:26 puffin kernel: [4478669.197458] vnstatd D 0000000000000000 0 1920 1 0x00000000 Jan 14 09:48:26 puffin kernel: [4478669.197469] ffff8801ff18bf90 0000000000000282 ffff8801fa651e10 ffff880000009680 Jan 14 09:48:26 puffin kernel: [4478669.197482] 0000000000000000 000000000000000a 000000000000f9e0 ffff8801fa651fd8 Jan 14 09:48:26 puffin kernel: [4478669.197494] 00000000000157c0 00000000000157c0 ffff8801fe335bd0 ffff8801fe335ec8 Jan 14 09:48:26 puffin kernel: [4478669.197505] Call Trace: Jan 14 09:48:26 puffin kernel: [4478669.197520] [<ffffffff8130f6fa>] ? error_exit+0x2a/0x60 Jan 14 09:48:26 puffin kernel: [4478669.197531] [<ffffffff8101251d>] ? retint_restore_args+0x5/0x6 Jan 14 09:48:26 puffin kernel: [4478669.197542] [<ffffffff8130e5c3>] ? __mutex_lock_common+0x122/0x192 Jan 14 09:48:26 puffin kernel: [4478669.197551] [<ffffffff8130e6eb>] ? mutex_lock+0x1a/0x31 Jan 14 09:48:26 puffin kernel: [4478669.197561] [<ffffffff810f7eac>] ? do_lookup+0x80/0x15d Jan 14 09:48:26 puffin kernel: [4478669.197569] [<ffffffff810f86aa>] ? __link_path_walk+0x323/0x6f5 Jan 14 09:48:26 puffin kernel: [4478669.197577] [<ffffffff810f8caa>] ? path_walk+0x66/0xc9 Jan 14 09:48:26 puffin kernel: [4478669.197585] [<ffffffff810fa114>] ? do_path_lookup+0x20/0x77 Jan 14 09:48:26 puffin kernel: [4478669.197593] [<ffffffff810fa2a0>] ? do_filp_open+0xe5/0x94b Jan 14 09:48:26 puffin kernel: [4478669.197602] [<ffffffff8130f054>] ? _spin_lock_irqsave+0x15/0x34 Jan 14 09:48:26 puffin kernel: [4478669.197612] [<ffffffff81068fa6>] ? hrtimer_try_to_cancel+0x3a/0x43 Jan 14 09:48:26 puffin kernel: [4478669.197622] [<ffffffff8102de30>] ? pvclock_clocksource_read+0x3a/0x8b Jan 14 09:48:26 puffin kernel: [4478669.197631] [<ffffffff81103705>] ? alloc_fd+0x67/0x10c Jan 14 09:48:26 puffin kernel: [4478669.197641] [<ffffffff810eeacf>] ? do_sys_open+0x55/0xfc Jan 14 09:48:26 puffin kernel: [4478669.197649] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
Things I noticed during the boot:
- We have lots of attempts to connect to the FTP server, pure-ftp, but none of our developers use it, can we uninstall it or does BOA require it?
- We have collectd installed but not configured, can we uninstall it or does BOA require it?
Counting failed FTP login attempts from /var/log/messages:
cat messages | grep pure-ftp | grep "New connection from" | wc -l 583 cat messages.1 | grep pure-ftp | grep "New connection from" | wc -l 2367 zcat messages.*.gz | grep pure-ftp | grep "New connection from" | wc -l 7203
That is over 10k FTP login attempts in the last month.
comment:5 follow-up: ↓ 6 Changed 3 years ago by jim
MySQL is offline -- are you still running updates?
comment:6 in reply to: ↑ 5 Changed 3 years ago by chris
Replying to jim:
MySQL is offline -- are you still running updates?
Crap, I assumed it would start on a reboot, I have tried starting it:
/etc/init.d/mysql restart [ ok ] Stopping MariaDB database server: mysqld. [FAIL] Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!
/etc/init.d/mysql start [FAIL] Starting MariaDB database server: mysqld . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . failed!
Gonna look at the logs now...
comment:7 Changed 3 years ago by chris
Fixed:
chown mysql:mysql /run/shm/mysql [ ok ] Starting MariaDB database server: mysqld . . . .. [info] Checking for corrupt, not cleanly closed and upgrade needing tables..
I don't know why that was needed, this is in my crontab:
crontab -e -u chris @reboot sudo mkdir /run/shm/mysql @reboot sudo chown mysql:mysql /run/shm/mysql
It's in my crontab as the root one is clobbered by BOA updates.
comment:8 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.5
- Total Hours changed from 0.25 to 0.75
I have removed collectd-core:
dpkg -r collectd-core (Reading database ... 58041 files and directories currently installed.) Removing collectd-core ... [ ok ] Stopping statistics collection and monitoring daemon: collectd. Processing triggers for man-db ...
But there is an issue here:
aptitude search collectd ... pB collectd - statistics collection and monitoring daemon c collectd-core - statistics collection and monitoring daemon (core system) ...
The flags mean:
State: c - removed, but config-files still present (ie, not purged) Action: B - Broken p - purge
So:
aptitude purge collectd No packages will be installed, upgraded, or removed. 0 packages upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Need to get 0 B of archives. After unpacking 0 B will be used. aptitude purge collectd-core The following packages will be REMOVED: collectd-core{p} 0 packages upgraded, 0 newly installed, 1 to remove and 0 not upgraded. Need to get 0 B of archives. After unpacking 0 B will be used. Do you want to continue? [Y/n/?] y (Reading database ... 57834 files and directories currently installed.) Removing collectd-core ... Purging configuration files for collectd-core ... /var/lib/dpkg/info/collectd-core.postrm: line 23: db_input: command not found /var/lib/dpkg/info/collectd-core.postrm: line 24: db_go: command not found /var/lib/dpkg/info/collectd-core.postrm: line 25: db_get: command not found /var/lib/dpkg/info/collectd-core.postrm: line 23: db_input: command not found /var/lib/dpkg/info/collectd-core.postrm: line 24: db_go: command not found /var/lib/dpkg/info/collectd-core.postrm: line 25: db_get: command not found dpkg: warning: while removing collectd-core, directory '/var/lib/collectd' not empty so not removed
And now it's gone:
aptitude search collectd p collectd - statistics collection and monitoring daemon p collectd-core - statistics collection and monitoring daemon (core system) p collectd-dbg - statistics collection and monitoring daemon (debugging symbols) p collectd-dev - statistics collection and monitoring daemon (development files) p collectd-utils - statistics collection and monitoring daemon (utilities) p kcollectd - simple collectd graphing frontend for KDE p libcollectdclient-dev - client library for collectd's control interface (development files) p libcollectdclient0 - client library for collectd's control interface
Regarding pure-ftp, it looks like it is installed from source by BOA, it's not listed as installed by aptitude:
aptitude search pure-ftp p mysqmail-pure-ftpd-logger - real-time logging system in MySQL - Pure-FTPd traffic-logger p pure-ftpd - Secure and efficient FTP server p pure-ftpd-common - Pure-FTPd FTP server (Common Files) p pure-ftpd-ldap - Secure and efficient FTP server with LDAP user authentication p pure-ftpd-mysql - Secure and efficient FTP server with MySQL user authentication p pure-ftpd-postgresql - Secure and efficient FTP server with PostgreSQL user authentication
In terms of the kernel lockup/crash that caused the server to be unresponsive, I don't know what to suggest other than keeping an eye out for things like this in the future.
comment:9 Changed 3 years ago by chris
For reference this is what I found in /var/log/syslog which alerted me to the reason why MySQL wouldn't start:
Jan 14 10:50:34 puffin mysqld: 140114 10:50:34 [ERROR] mysqld: Can't create/write to file '/run/shm/mysql/ibw6lZ2z' (Errcode: 13) Jan 14 10:50:34 puffin mysqld: 140114 10:50:34 InnoDB: Error: unable to create temporary file; errno: 13
comment:10 Changed 3 years ago by chris
This is the first email alert I got around the time of the kernel crash:
From: root@puffin.webarch.net Date: Tue, 14 Jan 2014 09:44:08 +0000 (GMT) To: chris@webarchitects.co.uk Subject: lfd on puffin.webarch.net: High 5 minute load average alert - 8.79 [-- Attachment #1 --] [-- Type: text/plain, Encoding: 7bit, Size: 0.2K --] Time: Tue Jan 14 09:42:47 2014 +0000 1 Min Load Avg: 29.58 5 Min Load Avg: 8.79 15 Min Load Avg: 3.33 Running/Total Processes: 44/414 [-- Attachment #2: ps.txt --] [-- Type: text/plain, Encoding: 7bit, Size: 0.1K --] Output from ps: Unable to obtain process output within 15 seconds - Timed out [-- Attachment #3: vmstat.txt --] [-- Type: text/plain, Encoding: 7bit, Size: 0.1K --] Output from vmstat: Unable to obtain vmstat output within 10 seconds - Timed out [-- Attachment #4: apachestatus.html --] [-- Type: text/html, Encoding: 7bit, Size: 0.1K --] Unable to retrieve Apache Server Status [http://127.0.0.1/server-status] - Unable to download: 404 - Not Found
comment:11 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.16
- Total Hours changed from 0.75 to 0.91
Perhaps the items in my crontab were run in the wrong order -- chowning a directory before it was created, so I have updated it to this:
@reboot sudo mkdir /run/shm/mysql ; sudo chown mysql:mysql /run/shm/mysql
And I have documented the crontab here: wiki:PuffinServer#Cron
comment:12 Changed 3 years ago by sam
Hi Chris
transitionnetwork.org is currently unavailable. I created a new ticket, but maybe it should just be a comment here?
Not responding to a ping at the moment.
Could you have a look?
Thanks
Sam
comment:13 Changed 3 years ago by chris
- Status changed from new to closed
- Resolution set to fixed
Closing as this is resolved.
It's recovered.