Uploaded image for project: 'OpenVZ'
  1. OpenVZ
  2. OVZ-7080

Crash kernels 2.6.32-042stab133.2 and 2.6.32-042stab134.8

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: OpenVZ-legacy
    • Component/s: Containers::Kernel
    • Security Level: Public
    • Environment:
      os - CentOS 6
      kernels - 2.6.32-042stab133.2 and 2.6.32-042stab134.8
      vzctl version - 4.10
      ploop version - 1.15

      Description

      Denis Maksimov (elky):
      Hello!

      We have node crash on *2.6.32-042stab133.2* with:

      <6>[13186478.967870] EXT4-fs (ploop10386p1): mounted filesystem with ordered data mode. Opts:
      <6>[13186478.968572] EXT4-fs (ploop10386p1): loaded balloon from 12 (12886024 blocks)
      <6>[13186478.995499] CT: 104516: started
      <6>[13186486.979269] Fatal resource shortage: numiptent, UB 104516.
      <4>[13186699.786306] fv_heartbeat
      <1>[13187720.004916] BUG: unable to handle kernel paging request at ffff87fe15e1b078
      <1>[13187720.005259] IP: [<ffffffff81468b73>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[13187720.005446] Kernel PGD 0
      <4>[13187720.005616] User PGD 0
      <4>[13187720.005786] Oops: 0000 [#1] SMP
      <4>[13187720.005962] last sysfs file: /sys/devices/virtual/block/ploop41233/ploop41233p1/uevent
      <4>[13187720.006300] CPU 28
      <4>[13187720.006310] Modules linked in: xt_TCPMSS kcare(U) xt_set ip_set nfnetlink xt_hl ip6t_rt ipt_addrtype coretemp sch_sfq cls_u32 sch_htb ip6t_LOG xt_limit ipt_REJECT xt_conntrack xt_multiport vzethdev pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT ip6table_mangle vzevent 8021q garp stp llc vznetdev bonding xt_comment iptable_filter ip6table_filter ip6_tables ipip tunnel4 nf_conntrack_ftp xt_recent vzrst vzcpt vzmon vzdev nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_MASQUERADE iptable_raw iptable_mangle xt_connlimit ipt_REDIRECT xt_owner nf_conntrack_ipv6 nf_defrag_ipv6 xt_state ipt_LOG xfrm_ipcomp xfrm4_mode_transport pppol2tp pppox xfrm6_mode_tunnel xfrm4_mode_tunnel esp6 ipv6 esp4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_generic slhc crc_ccitt fuse iptable_nat nf_nat ip_tables nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 tun acpi_cpufreq freq_table mperf iTCO_wdt iTCO_vendor_support power_meter acpi_ipmi ipmi_si ipmi_msghandler sb_edac edac_core i2c_i801 lpc_ich mfd_core shpchp ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core sg joydev ext4 jbd2 mbcache sd_mod crc_t10dif xhci_hcd megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>[13187720.009672]
      <4>[13187720.009837] Pid: 103785, comm: python3 veid: 135244 Tainted: G C -- ------------ 2.6.32-042stab133.2 #1 042stab133_2 Supermicro SYS-F618R2-RT+/X10DRFR
      <4>[13187720.010346] RIP: 0010:[<ffffffff81468b73>] [<ffffffff81468b73>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[13187720.010688] RSP: 0018:ffff88213f5e39b8 EFLAGS: 00010293
      <4>[13187720.010860] RAX: ffff88014b7c0020 RBX: ffff88213f5e3e68 RCX: ffffea00068f3680
      <4>[13187720.011189] RDX: ffffffff994cb60b RSI: ffff88014b7c0000 RDI: ffff8840701b5418
      <4>[13187720.011519] RBP: ffff88213f5e3a38 R08: 000000000000012d R09: 0000000000000281
      <4>[13187720.011847] R10: 0000000000000000 R11: 000000000000012d R12: 0000000000000281
      <4>[13187720.012176] R13: 0000000000000acb R14: 0000000000000281 R15: 0000000000000281
      <4>[13187720.012506] FS: 00007f3829ac2700(0000) GS:ffff882100f00000(0000) knlGS:0000000000000000
      <4>[13187720.012838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[13187720.013011] CR2: ffff87fe15e1b078 CR3: 000000015c4e2000 CR4: 00000000001607e0
      <4>[13187720.013341] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[13187720.017787] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[13187720.018121] Process python3 (pid: 103785, veid: 135244, threadinfo ffff88213f5e0000, task ffff88240401e040)
      <4>[13187720.018458] Stack:
      <4>[13187720.018621] ffff88213f5e3a68 ffffffff81565ac5 000000000401e040 ffff88213f5e3e68
      <4>[13187720.018809] <d> ffff88014b7c0000 ffff88014b7c0010 ffffea00068f3680 ffff8840701b5418
      <4>[13187720.019154] <d> ffff8821994cb60b 0000000000000246 ffff88213f5e3a68 0000000000000000
      <4>[13187720.019657] Call Trace:
      i801 lpc_ich mfd_core shpchp ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core sg joydev ext4
      jbd2 mbcache sd_mod crc_t10dif xhci_hcd megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod
       [last unloaded: scsi_wait_scan]
      <4>[13187720.009672]
      <4>[13187720.009837] Pid: 103785, comm: python3 veid: 135244 Tainted: G C -- ----------
      -- 2.6.32-042stab133.2 #1 042stab133_2 Supermicro SYS-F618R2-RT+/X10DRFR
      <4>[13187720.010346] RIP: 0010:[<ffffffff81468b73>] [<ffffffff81468b73>] dma_memcpy_pg_to_iove
      c+0x103/0x1d0
      <4>[13187720.010688] RSP: 0018:ffff88213f5e39b8 EFLAGS: 00010293
      <4>[13187720.010860] RAX: ffff88014b7c0020 RBX: ffff88213f5e3e68 RCX: ffffea00068f3680
      <4>[13187720.011189] RDX: ffffffff994cb60b RSI: ffff88014b7c0000 RDI: ffff8840701b5418
      <4>[13187720.011519] RBP: ffff88213f5e3a38 R08: 000000000000012d R09: 0000000000000281
      <4>[13187720.011847] R10: 0000000000000000 R11: 000000000000012d R12: 0000000000000281
      <4>[13187720.012176] R13: 0000000000000acb R14: 0000000000000281 R15: 0000000000000281
      <4>[13187720.012506] FS: 00007f3829ac2700(0000) GS:ffff882100f00000(0000) knlGS:00000000000000
      00
      <4>[13187720.012838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[13187720.013011] CR2: ffff87fe15e1b078 CR3: 000000015c4e2000 CR4: 00000000001607e0
      <4>[13187720.013341] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[13187720.017787] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[13187720.018121] Process python3 (pid: 103785, veid: 135244, threadinfo ffff88213f5e0000, t
      ask ffff88240401e040)
      <4>[13187720.018458] Stack:
      <4>[13187720.018621] ffff88213f5e3a68 ffffffff81565ac5 000000000401e040 ffff88213f5e3e68
      <4>[13187720.018809] <d> ffff88014b7c0000 ffff88014b7c0010 ffffea00068f3680 ffff8840701b5418
      <4>[13187720.019154] <d> ffff8821994cb60b 0000000000000246 ffff88213f5e3a68 0000000000000000
      <4>[13187720.019657] Call Trace:
      <4>[13187720.019836] [<ffffffff81565ac5>] ? schedule_timeout+0x215/0x2f0
      <4>[13187720.020021] [<ffffffff814b59d7>] dma_skb_copy_datagram_iovec+0x117/0x2d0
      <4>[13187720.020208] [<ffffffff814e75a5>] tcp_recvmsg+0x5d5/0x11b0
      <4>[13187720.020387] [<ffffffff815643eb>] ? schedule+0x5ab/0xe60
      <4>[13187720.020577] [<ffffffff8150b4e0>] inet_recvmsg+0x60/0xa0
      <4>[13187720.020757] [<ffffffff81484687>] sock_recvmsg+0x127/0x160
      <4>[13187720.020940] [<ffffffff810b4dc0>] ? autoremove_wake_function+0x0/0x40
      <4>[13187720.021121] [<ffffffff810d6733>] ? futex_wake+0x93/0x150
      <4>[13187720.021304] [<ffffffff81072e5e>] ? perf_event_task_sched_out+0x2e/0x70
      <4>[13187720.021484] [<ffffffff8106c4c0>] ? __dequeue_entity+0x30/0x50
      <4>[13187720.021661] [<ffffffff8148480e>] sys_recvfrom+0xee/0x180
      <4>[13187720.021842] [<ffffffff8107a9fe>] ? finish_task_switch+0xce/0x120
      <4>[13187720.022020] [<ffffffff815643eb>] ? schedule+0x5ab/0xe60
      <4>[13187720.022198] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[13187720.022375] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[13187720.022552] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[13187720.022732] [<ffffffff8110b177>] ? audit_syscall_entry+0x1d7/0x200
      <4>[13187720.022907] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[13187720.023083] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[13187720.023258] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[13187720.023434] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[13187720.023611] [<ffffffff815703e7>] system_call_fastpath+0x35/0x3a
      <4>[13187720.023788] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[13187720.023966] Code: 00 10 00 00 45 0f 4f f7 44 29 e8 48 63 55 c0 41 39 c6 48 8b 7d b8 45 89 d8 44 0f 4f f0 48 8b 41 08 48 8b 4d b0 4d 63 e6 4d 89 e1 <48> 8b 34 d0 44 89 ea 44 89 5d 88 e8 4d e8 ff ff 85 c0 44 8b 5d
      <1>[13187720.024728] RIP [<ffffffff81468b73>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[13187720.024912] RSP <ffff88213f5e39b8>
      <4>[13187720.025079] CR2: ffff87fe15e1b078


      and on *2.6.32-042stab134.8* with same error:

      <6>[110168.283001] Fatal resource shortage: numiptent, UB 143645.
      <6>[111972.208897] Fatal resource shortage: numiptent, UB 143645.
      <1>[112204.943692] BUG: unable to handle kernel paging request at ffff87fec4ae7da8
      <1>[112204.943875] IP: [<ffffffff81468fd3>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[112204.944055] Kernel PGD 0
      <4>[112204.944219] User PGD 0
      <4>[112204.944384] Oops: 0000 [#1] SMP
      <4>[112204.944552] last sysfs file: /sys/devices/system/node/node1/numastat
      <4>[112204.944722] CPU 10
      <4>[112204.944730] Modules linked in: kcare(U) sch_sfq cls_u32 sch_htb coretemp ip6t_LOG xt_lim
      it xt_TCPMSS xt_conntrack ipt_REJECT xt_multiport vzethdev pio_kaio pio_nfs pio_direct pfmt_raw
       pfmt_ploop1 ploop simfs vziolimit vzdquota ip6t_REJECT ip6table_mangle vzevent 8021q garp stp
      llc vznetdev bonding xt_comment iptable_filter ip6table_filter ip6_tables ipip tunnel4 nf_connt
      rack_ftp xt_recent vzrst vzcpt vzmon vzdev nfs lockd fscache auth_rpcgss nfs_acl sunrpc ipt_MAS
      QUERADE iptable_raw iptable_mangle xt_connlimit ipt_REDIRECT xt_owner nf_conntrack_ipv6 nf_defr
      ag_ipv6 xt_state ipt_LOG xfrm_ipcomp xfrm4_mode_transport pppol2tp pppox xfrm6_mode_tunnel xfrm
      4_mode_tunnel esp6 ipv6 esp4 af_key arc4 ecb ppp_mppe ppp_deflate zlib_deflate ppp_async ppp_ge
      neric slhc crc_ccitt fuse iptable_nat nf_nat ip_tables nf_conntrack_ipv4 nf_conntrack nf_defrag
      _ipv4 tun acpi_cpufreq freq_table mperf iTCO_wdt iTCO_vendor_support power_meter acpi_ipmi ipmi
      _si ipmi_msghandler sb_edac edac_core i2c_i801 lpc_ich mfd_core shpchp ioatdma igb dca i2c_algo_bit i2c_core ptp pps_core sg joydev ext4 jbd2 mbcache sd_mod crc_t10dif xhci_hcd megaraid_sas wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      <4>[112204.947988]
      <4>[112204.948150] Pid: 699573, comm: python3 veid: 135244 Not tainted 2.6.32-042stab134.8 #1 042stab134_8 Supermicro SYS-F618R2-RT+/X10DRFR
      <4>[112204.948492] RIP: 0010:[<ffffffff81468fd3>] [<ffffffff81468fd3>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[112204.948828] RSP: 0018:ffff8818567af9b8 EFLAGS: 00010283
      <4>[112204.948995] RAX: ffff8801b3f78020 RBX: ffff8818567afe68 RCX: ffffea0100d84b40
      <4>[112204.949316] RDX: ffffffffa216dfb1 RSI: ffff8801b3f78000 RDI: ffff8840701c5218
      <4>[112204.949638] RBP: ffff8818567afa38 R08: 0000000000000958 R09: 0000000000000004
      <4>[112204.949961] R10: 0000000000000000 R11: 0000000000000958 R12: 0000000000000004
      <4>[112204.950284] R13: 00000000000005f0 R14: 0000000000000004 R15: 0000000000000004
      <4>[112204.950608] FS: 00007f9fce5f6700(0000) GS:ffff882100c80000(0000) knlGS:0000000000000000
      <4>[112204.950934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[112204.951102] CR2: ffff87fec4ae7da8 CR3: 0000001746822000 CR4: 00000000001607e0
      <4>[112204.951424] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      <4>[112204.951747] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      <4>[112204.952072] Process python3 (pid: 699573, veid: 135244, threadinfo ffff8818567ac000, task ffff882029626740)
      <4>[112204.952400] Stack:
      <4>[112204.952559] ffff8818567afa68 ffffffff81566175 00000000567afa08 ffff8818567afe68
      <4>[112204.952744] <d> ffff8801b3f78000 ffff8801b3f78010 ffffea0100d84b40 ffff8840701c5218
      <4>[112204.953082] <d> ffff8818a216dfb1 0000000000000246 ffff8818567afa68 0000000000000000
      <4>[112204.953574] Call Trace:
      <4>[112204.953745] [<ffffffff81566175>] ? schedule_timeout+0x215/0x2f0
      <4>[112204.953920] [<ffffffff814b5f27>] dma_skb_copy_datagram_iovec+0x117/0x2d0
      <4>[112204.954101] [<ffffffff814e7c35>] tcp_recvmsg+0x5d5/0x11b0
      <4>[112204.954281] [<ffffffff8150bb70>] inet_recvmsg+0x60/0xa0
      <4>[112204.954453] [<ffffffff81484ae7>] sock_recvmsg+0x127/0x160
      <4>[112204.954621] [<ffffffff8148572d>] ? sock_sendmsg+0x11d/0x150
      <4>[112204.954796] [<ffffffff810b4dc0>] ? autoremove_wake_function+0x0/0x40
      <4>[112204.954970] [<ffffffff810d6733>] ? futex_wake+0x93/0x150
      <4>[112204.955144] [<ffffffff81184abb>] ? handle_mm_fault+0x27b/0x360
      <4>[112204.955318] [<ffffffff8106c4c0>] ? __dequeue_entity+0x30/0x50
      <4>[112204.955489] [<ffffffff81484c6e>] sys_recvfrom+0xee/0x180
      <4>[112204.955662] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[112204.955835] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[112204.956017] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[112204.956191] [<ffffffff8110b177>] ? audit_syscall_entry+0x1d7/0x200
      <4>[112204.956361] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[112204.956531] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[112204.956702] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[112204.956874] [<ffffffff81570302>] ? system_call_after_swapgs+0xa2/0x152
      <4>[112204.957055] [<ffffffff815703e7>] system_call_fastpath+0x35/0x3a
      <4>[112204.957227] [<ffffffff8157030e>] ? system_call_after_swapgs+0xae/0x152
      <4>[112204.957398] Code: 00 10 00 00 45 0f 4f f7 44 29 e8 48 63 55 c0 41 39 c6 48 8b 7d b8 45 89 d8 44 0f 4f f0 48 8b 41 08 48 8b 4d b0 4d 63 e6 4d 89 e1 <48> 8b 34 d0 44 89 ea 44 89 5d 88 e8 4d e8 ff ff 85 c0 44 8b 5d
      <1>[112204.958133] RIP [<ffffffff81468fd3>] dma_memcpy_pg_to_iovec+0x103/0x1d0
      <4>[112204.958309] RSP <ffff8818567af9b8>
      <4>[112204.958472] CR2: ffff87fec4ae7da8
      {code}

      The gap between the crashs of the node, was about 30 hours

      Perhaps this is a simple coincidence, but in both dumps the same process *python3* is marked. It may be useful for you to know that this process was part of a *fail2ban* service running inside a virtual machine where the *numiptent* limit was exceeded.

      In both cases, shortly before the fall, the following errors were noted:

      {code:title=dmesg|borderStyle=solid}
      <6>[110168.283001] Fatal resource shortage: numiptent, UB 143645.
      <6>[111972.208897] Fatal resource shortage: numiptent, UB 143645.
      {code}.
        
      We can upload core dumps if it need and etc.

        Attachments

          Activity

            People

            Assignee:
            vvs Vasily Averin
            Reporter:
            elky Denis Maksimov
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: