Ticket #543 (closed maintenance: fixed)
Puffin Load Spike
Reported by: | chris | Owned by: | chris |
---|---|---|---|
Priority: | minor | Milestone: | Maintenance |
Component: | Live server | Keywords: | |
Cc: | ed, jim | Estimated Number of Hours: | 0.0 |
Add Hours to Ticket: | 0 | Billable?: | yes |
Total Hours: | 0.55 |
Description
Puffin just had a massive load spike up to 65 and then it calmed down and it's up in the 70's again:
uptime 21:46:43 up 91 days, 3:01, 4 users, load average: 73.45, 30.76, 28.09
I'm trying to find the cause, it's making the TN site unresponsive.
Change History
comment:2 in reply to: ↑ 1 Changed 4 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.25
- Total Hours changed from 0.0 to 0.25
Replying to mark:
I did a development platform build earlier on Aegir, but haven't done
anything since (for the last 4 hours). Guessing it's *not* this, but
thought I'd mention it anyway...
4 hours ago -- that rules it out, thanks for confirming that this wasn't the cause.
The load peaked almost in the 80s:
uptime 21:46:37 up 91 days, 3:01, 3 users, load average: 79.76, 31.26, 28.24
It looks to me like there was some kind of kernel issue, this from /var/log/kern.log there were a lot of these for different CPUs:
May 3 21:46:35 puffin kernel: [7873260.245957] BUG: soft lockup - CPU#0 stuck for 61s! [munin-node:40006] May 3 21:46:35 puffin kernel: [7873260.245973] Modules linked in: nf_nat_ftp ip6table_filter ip6_tables joydev sg st sd_mod crc_t10dif sr_mod scsi_mod ide_gd_mod ide_cd_mod ide_core cdrom xt_tcpudp ipt_REDIRECT xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT ipt_LOG xt_limit xt_multiport xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext3 jbd mbcache dm_mod xen_blkfront xen_netfront May 3 21:46:35 puffin kernel: [7873260.246078] CPU 0: May 3 21:46:35 puffin kernel: [7873260.246084] Modules linked in: nf_nat_ftp ip6table_filter ip6_tables joydev sg st sd_mod crc_t10dif sr_mod scsi_mod ide_gd_mod ide_cd_mod ide_core cdrom xt_tcpudp ipt_REDIRECT xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT ipt_LOG xt_limit xt_multiport xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables snd_pcm snd_timer snd soundcore snd_page_alloc pcspkr evdev ext3 jbd mbcache dm_mod xen_blkfront xen_netfront May 3 21:46:35 puffin kernel: [7873260.246181] Pid: 40006, comm: munin-node Not tainted 2.6.32-5-xen-amd64 #1 May 3 21:46:35 puffin kernel: [7873260.246188] RIP: e030:[<ffffffff8130e0a8>] [<ffffffff8130e0a8>] _spin_lock+0x15/0x1b May 3 21:46:35 puffin kernel: [7873260.246204] RSP: e02b:ffff8800ef5f1ec0 EFLAGS: 00000293 May 3 21:46:35 puffin kernel: [7873260.246210] RAX: 0000000000000c27 RBX: ffff8800b99accc0 RCX: 0000000000000000 May 3 21:46:35 puffin kernel: [7873260.246218] RDX: 0000000000000c24 RSI: ffffffff813e3d79 RDI: ffffffff814a1000 May 3 21:46:35 puffin kernel: [7873260.246226] RBP: ffff8800ef5f1ef8 R08: 000000000097c1f0 R09: 0000000000000000 May 3 21:46:35 puffin kernel: [7873260.246234] R10: 0000000000a3a6d8 R11: ffffffff81154882 R12: ffff8800ff44f240 May 3 21:46:35 puffin kernel: [7873260.246241] R13: 0000000000c9b268 R14: ffff8800ef5f1f58 R15: 0000000000c9b058 May 3 21:46:35 puffin kernel: [7873260.246255] FS: 00007f54ce992700(0000) GS:ffff880003b85000(0000) knlGS:0000000000000000 May 3 21:46:35 puffin kernel: [7873260.246264] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 21:46:35 puffin kernel: [7873260.246270] CR2: 0000000000a33948 CR3: 000000007cc21000 CR4: 0000000000000660 May 3 21:46:35 puffin kernel: [7873260.246278] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 May 3 21:46:35 puffin kernel: [7873260.247305] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 May 3 21:46:35 puffin kernel: [7873260.247305] Call Trace: May 3 21:46:35 puffin kernel: [7873260.247305] [<ffffffff811001f6>] ? d_alloc+0x14b/0x186 May 3 21:46:35 puffin kernel: [7873260.247305] [<ffffffff810f6836>] ? create_write_pipe+0xe9/0x1aa May 3 21:46:35 puffin kernel: [7873260.247305] [<ffffffff810f691c>] ? do_pipe_flags+0x25/0xdd May 3 21:46:35 puffin kernel: [7873260.247305] [<ffffffff810f69e7>] ? sys_pipe2+0x13/0x4f May 3 21:46:35 puffin kernel: [7873260.247305] [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
This is the result from grepping for BUG in kern.log from today
May 3 21:46:35 puffin kernel: [7873260.245957] BUG: soft lockup - CPU#0 stuck for 61s! [munin-node:40006] May 3 21:46:35 puffin kernel: [7873260.273706] BUG: soft lockup - CPU#10 stuck for 61s! [nginx:39158] May 3 21:46:35 puffin kernel: [7873260.251798] BUG: soft lockup - CPU#2 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873260.246490] BUG: soft lockup - CPU#4 stuck for 61s! [apt_all:39947] May 3 21:46:35 puffin kernel: [7873260.309621] BUG: soft lockup - CPU#1 stuck for 61s! [bzip2:40041] May 3 21:46:35 puffin kernel: [7873260.251808] BUG: soft lockup - CPU#13 stuck for 61s! [bash:40060] May 3 21:46:35 puffin kernel: [7873260.309639] BUG: soft lockup - CPU#12 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873260.309629] BUG: soft lockup - CPU#9 stuck for 61s! [php-fpm:39175] May 3 21:46:35 puffin kernel: [7873260.309652] BUG: soft lockup - CPU#6 stuck for 61s! [drush.php:39942] May 3 21:46:35 puffin kernel: [7873260.323249] BUG: soft lockup - CPU#7 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873260.323247] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873260.309637] BUG: soft lockup - CPU#8 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873260.323252] BUG: soft lockup - CPU#11 stuck for 61s! [swapper:0] May 3 21:46:35 puffin kernel: [7873261.417057] BUG: soft lockup - CPU#3 stuck for 61s! [swapper:0]
The same thing happened on 29th April:
Apr 29 16:04:00 puffin kernel: [7506959.700011] BUG: soft lockup - CPU#0 stuck for 61s! [munin-node:16305] Apr 29 16:04:00 puffin kernel: [7506959.728797] BUG: soft lockup - CPU#4 stuck for 61s! [bash:2248] Apr 29 16:04:00 puffin kernel: [7506959.734715] BUG: soft lockup - CPU#11 stuck for 61s! [bash:2243] Apr 29 16:04:00 puffin kernel: [7506959.779966] BUG: soft lockup - CPU#9 stuck for 61s! [bash:2247] Apr 29 16:04:00 puffin kernel: [7506959.807022] BUG: soft lockup - CPU#3 stuck for 61s! [drush.php:2227] Apr 29 16:04:00 puffin kernel: [7506959.841325] BUG: soft lockup - CPU#6 stuck for 61s! [metche:2240] Apr 29 16:04:00 puffin kernel: [7506959.839965] BUG: soft lockup - CPU#1 stuck for 61s! [bash:2244] Apr 29 16:04:00 puffin kernel: [7506959.882711] BUG: soft lockup - CPU#7 stuck for 61s! [bash:2246] Apr 29 16:04:00 puffin kernel: [7506959.859438] BUG: soft lockup - CPU#13 stuck for 61s! [perl:2241] Apr 29 16:04:00 puffin kernel: [7506959.908581] BUG: soft lockup - CPU#8 stuck for 61s! [backupninja:2239] Apr 29 16:04:00 puffin kernel: [7506959.908581] BUG: soft lockup - CPU#12 stuck for 61s! [apt_all:2237] Apr 29 16:04:00 puffin kernel: [7506960.812723] BUG: soft lockup - CPU#10 stuck for 61s! [bash:2249] Apr 29 16:04:00 puffin kernel: [7506960.844747] BUG: soft lockup - CPU#2 stuck for 61s! [php-fpm:1139] Apr 29 16:04:00 puffin kernel: [7506961.894745] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0] Apr 29 16:04:00 puffin kernel: [7507025.168019] BUG: soft lockup - CPU#0 stuck for 61s! [munin-node:16305] Apr 29 16:04:00 puffin kernel: [7507025.302077] BUG: soft lockup - CPU#13 stuck for 61s! [perl:2241] Apr 29 16:04:00 puffin kernel: [7507025.170185] BUG: soft lockup - CPU#11 stuck for 61s! [bash:2243] Apr 29 16:04:00 puffin kernel: [7507025.170188] BUG: soft lockup - CPU#9 stuck for 61s! [bash:2247] Apr 29 16:04:00 puffin kernel: [7507025.246231] BUG: soft lockup - CPU#6 stuck for 61s! [metche:2240] Apr 29 16:04:00 puffin kernel: [7507025.305361] BUG: soft lockup - CPU#7 stuck for 61s! [bash:2246] Apr 29 16:04:00 puffin kernel: [7507025.305648] BUG: soft lockup - CPU#8 stuck for 61s! [backupninja:2239] Apr 29 16:04:00 puffin kernel: [7507025.306237] BUG: soft lockup - CPU#4 stuck for 61s! [bash:2248] Apr 29 16:04:00 puffin kernel: [7507025.306234] BUG: soft lockup - CPU#3 stuck for 61s! [drush.php:2227] Apr 29 16:04:00 puffin kernel: [7507025.308802] BUG: soft lockup - CPU#1 stuck for 61s! [bash:2244] Apr 29 16:04:00 puffin kernel: [7507025.315487] BUG: soft lockup - CPU#12 stuck for 61s! [apt_all:2237] Apr 29 16:04:00 puffin kernel: [7507026.280892] BUG: soft lockup - CPU#2 stuck for 61s! [php-fpm:1139] Apr 29 16:04:00 puffin kernel: [7507026.502369] BUG: soft lockup - CPU#10 stuck for 61s! [bash:2249] Apr 29 16:04:00 puffin kernel: [7507027.370539] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0] Apr 29 16:04:00 puffin kernel: [7507090.700009] BUG: soft lockup - CPU#0 stuck for 61s! [munin-node:16305] Apr 29 16:04:00 puffin kernel: [7507090.673472] BUG: soft lockup - CPU#3 stuck for 61s! [drush.php:2227] Apr 29 16:04:00 puffin kernel: [7507090.764482] BUG: soft lockup - CPU#6 stuck for 61s! [metche:2240] Apr 29 16:04:00 puffin kernel: [7507090.820913] BUG: soft lockup - CPU#11 stuck for 61s! [bash:2243] Apr 29 16:04:00 puffin kernel: [7507090.764491] BUG: soft lockup - CPU#9 stuck for 61s! [bash:2247] Apr 29 16:04:00 puffin kernel: [7507090.764500] BUG: soft lockup - CPU#1 stuck for 61s! [bash:2244] Apr 29 16:04:00 puffin kernel: [7507090.764492] BUG: soft lockup - CPU#13 stuck for 61s! [perl:2241] Apr 29 16:04:00 puffin kernel: [7507090.764500] BUG: soft lockup - CPU#4 stuck for 61s! [bash:2248] Apr 29 16:04:00 puffin kernel: [7507090.820931] BUG: soft lockup - CPU#8 stuck for 61s! [backupninja:2239] Apr 29 16:04:00 puffin kernel: [7507090.858239] BUG: soft lockup - CPU#7 stuck for 61s! [bash:2246] Apr 29 16:04:00 puffin kernel: [7507090.942543] BUG: soft lockup - CPU#12 stuck for 61s! [apt_all:2237] Apr 29 16:04:00 puffin kernel: [7507092.017604] BUG: soft lockup - CPU#2 stuck for 61s! [php-fpm:1139] Apr 29 16:04:00 puffin kernel: [7507092.054278] BUG: soft lockup - CPU#10 stuck for 61s! [bash:2249] Apr 29 16:04:00 puffin kernel: [7507092.983020] BUG: soft lockup - CPU#5 stuck for 61s! [swapper:0]
I don't know the cause.
comment:3 Changed 4 years ago by chris
There are some old Xen bug reports of this problem, related to heavy load, here: http://bugzilla.xensource.com/bugzilla/buglist.cgi?quicksearch=BUG%3A+soft+lockup+detected
comment:5 Changed 4 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.1
- Priority changed from critical to minor
- Total Hours changed from 0.25 to 0.35
Dropping this to minor but not closing because the cause of the kernel problems haven't been tracked down however this issue hasn't occurred again.
Since then the host machines kernel and version of Xen has been downgraded due to an unresolved bug, see ticket:535#comment:6
comment:7 Changed 4 years ago by chris
See also ticket:552 for another load spike issue.
comment:8 Changed 3 years ago by mark
- Cc mark removed
Just removing myself from this ticket, as I don't believe I can offer any help.
comment:9 Changed 3 years ago by chris
- Add Hours to Ticket changed from 0.0 to 0.2
- Status changed from accepted to closed
- Resolution set to fixed
- Total Hours changed from 0.35 to 0.55
This issue has been followed up on ticket:555 and documented at wiki:PuffinServer#LoadSpikes which now included a list of all related tickets.