Ticket #862 (closed defect: fixed)

Opened 17 months ago

Last modified 17 months ago

Puffin locked

Reported by: chris Owned by: ade
Priority: critical Milestone: Maintenance
Component: Live server Keywords:
Cc: ade, paul Estimated Number of Hours: 0.0
Add Hours to Ticket: 0 Billable?: yes
Total Hours: 0.25

Description

PuffinServer is not responding, I got a Munin email alert, on the Xen console:

[2008077.910371] BUG: soft lockup - CPU#1 stuck for 61s! [munin-node [::f:25444]
[2008077.910371] Modules linked in: joydev sg st sd_mod crc_t10dif sr_mod scsi_mod ide_gd_mod ide_cd_mod ide_core cdrom xt_recent xt_tcpudp xt_connlimit nf_nat_ftp ipt_REDIRECT xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT ipt_LOG xt_limit xt_multiport iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables snd_pcm snd_timer snd soundcore snd_page_alloc evdev pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront
[2008077.910371] CPU 1:
[2008077.910371] Modules linked in: joydev sg st sd_mod crc_t10dif sr_mod scsi_mod ide_gd_mod ide_cd_mod ide_core cdrom xt_recent xt_tcpudp xt_connlimit nf_nat_ftp ipt_REDIRECT xt_conntrack iptable_mangle nf_conntrack_ftp ipt_REJECT ipt_LOG xt_limit xt_multiport iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables x_tables snd_pcm snd_timer snd soundcore snd_page_alloc evdev pcspkr ext4 crc16 jbd2 mbcache dm_mod xen_netfront xen_blkfront
[2008077.910371] Pid: 25444, comm: munin-node [::f Not tainted 2.6.32-5-xen-amd64 #1 
[2008077.910371] RIP: e030:[<ffffffff8100922a>]  [<ffffffff8100922a>] hypercall_page+0x22a/0x1001
[2008077.910371] RSP: e02b:ffff8800988f7ba8  EFLAGS: 00000246
[2008077.910371] RAX: 0000000000040000 RBX: ffffea0006c2eb88 RCX: ffffffff8100922a
[2008077.910371] RDX: 00000000ffffff00 RSI: 0000000000000000 RDI: 0000000000000000
[2008077.910371] RBP: 0000000000000002 R08: 0000000000000002 R09: ffff8801ffc1dd00
[2008077.910371] R10: 0000000000000002 R11: 0000000000000246 R12: ffff88000000ad00
[2008077.910371] R13: ffff880000008000 R14: 0000000000000200 R15: 000000000000000e
[2008077.910371] FS:  00007fefd0bde700(0000) GS:ffff88000bb20000(0000) knlGS:0000000000000000
[2008077.910371] CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[2008077.910371] CR2: 00007fefce124380 CR3: 0000000001001000 CR4: 0000000000000660
[2008077.910371] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[2008077.910371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[2008077.910371] Call Trace:
[2008077.910371]  [<ffffffff810baa62>] ? free_hot_cold_page+0x1a2/0x1af
[2008077.910371]  [<ffffffff8100e635>] ? xen_force_evtchn_callback+0x9/0xa
[2008077.910371]  [<ffffffff8100ecf2>] ? check_events+0x12/0x20
[2008077.910371]  [<ffffffff8100ecdf>] ? xen_restore_fl_direct_end+0x0/0x1
[2008077.910371]  [<ffffffff8130f142>] ? _spin_unlock_irqrestore+0xd/0xe
[2008077.910371]  [<ffffffff810bd9ca>] ? release_pages+0x16a/0x18d
[2008077.910371]  [<ffffffff8100c1a7>] ? xen_mc_flush+0x159/0x185
[2008077.910371]  [<ffffffff810da555>] ? free_pages_and_swap_cache+0x57/0x73
[2008077.910371]  [<ffffffff810cd5bf>] ? unmap_vmas+0x6cb/0x959
[2008077.910371]  [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1001
[2008077.910371]  [<ffffffff8100922a>] ? hypercall_page+0x22a/0x1001
[2008077.910371]  [<ffffffff810d1bca>] ? exit_mmap+0xc4/0x148
[2008077.910371]  [<ffffffff8104cd95>] ? mmput+0x3c/0xdf
[2008077.910371]  [<ffffffff81050a2e>] ? exit_mm+0x102/0x10d
[2008077.910371]  [<ffffffff81052453>] ? do_exit+0x1f8/0x6c9
[2008077.910371]  [<ffffffff8105299a>] ? do_group_exit+0x76/0x9d
[2008077.910371]  [<ffffffff810529d3>] ? sys_exit_group+0x12/0x16
[2008077.910371]  [<ffffffff81011b42>] ? system_call_fastpath+0x16/0x1b
[2008381.335662] Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:16:3e:19:68:02:00:12:1e:13:6c:db:08:00 SRC=122.172.30.208 DST=81.95.52.103 LEN=60 TOS=0x08 PREC=0x20 TTL=55 ID=30880 DF PROTO=TCP SPT=60400 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 
[2008384.465057] Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:16:3e:19:68:02:00:12:1e:13:6c:db:08:00 SRC=122.172.30.208 DST=81.95.52.103 LEN=60 TOS=0x08 PREC=0x20 TTL=55 ID=30881 DF PROTO=TCP SPT=60400 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 
[2008390.255448] Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=00:16:3e:19:68:02:00:12:1e:13:6c:db:08:00 SRC=122.172.30.208 DST=81.95.52.103 LEN=60 TOS=0x08 PREC=0x20 TTL=55 ID=30882 DF PROTO=TCP SPT=60400 DPT=23 WINDOW=5808 RES=0x00 SYN URGP=0 

Change History

comment:1 Changed 17 months ago by chris

  • Add Hours to Ticket changed from 0.0 to 0.25
  • Status changed from new to closed
  • Resolution set to fixed
  • Total Hours changed from 0.0 to 0.25

Shutting down via xen didn't work, I had to destroy the vm and recreate it:

xm shutdown puffin.webarch.net
xm shutdown puffin.webarch.net
xm create puffin.webarch.net.cfg

It's now back up and seems fine.

Note: See TracTickets for help on using tickets.