Details
-
Type: Bug
-
Status: Open
-
Priority: Major
-
Resolution: Unresolved
-
Fix Version/s: Vz7.0-Update-next
-
Component/s: Containers::Kernel
-
Security Level: Public
-
Environment:Dell 2950
CPU: E5450 @ 3.00GHz
Virtuozzo Linux release 7.9 - 3.10.0-1160.53.1.vz7.185.3
Description
Multiple servers are rebooting at random times since upgrading them from CentOS 7 with OpenVZ 6 to Virtuozzo Linux 7.9 with OpenVZ 7
in the vmcore-dmesg.txt file I'm seeing a ton of [689921.681782] SLUB: Unable to allocate memory on node -1 (gfp=0x20) , while it looks like there is more then enough available memory.
There is monitoring in place that captures and logs in the top data every minute, here is the head of top, prior to the reboot
top - 06:03:43 up 13 days, 18:44, 0 users, load average: 3.76, 2.86, 2.17
Tasks: 732 total, 2 running, 730 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19.0 us, 6.3 sy, 0.0 ni, 46.8 id, 27.2 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 32774128 total, 3153448 free, 7168432 used, 22452248 buff/cache
KiB Swap: 67108860 total, 66478156 free, 630704 used. 25131852 avail Mem
Memory on the server was upgraded from 32G to 64G and initally the thought on the errors (same as above) was that there was a problem with the new memory so the old memory was placed back in, which was running fine on OpenVZ 6. After a few days (in top looks like 13 days above) the server has rebooted again and the only change now is the upgrade to OpenVZ 7.
Here is the end of the vmcore-dmesg.txt file:
[689921.681693] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681702] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681706] node 0: slabs: 9, objs: 576, free: 0
[689921.681744] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681746] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681748] node 0: slabs: 9, objs: 576, free: 0
[689921.681763] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681766] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681767] node 0: slabs: 9, objs: 576, free: 0
[689921.681782] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681784] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681786] node 0: slabs: 9, objs: 576, free: 0
[689921.681800] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681802] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681804] node 0: slabs: 9, objs: 576, free: 0
[787524.521853] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788228.484728] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788346.308456] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788366.582781] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[808605.027142] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[808610.424453] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[808636.354702] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[810821.403799] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[810821.403808] cache: anon_vma_chain(9430977:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[810821.403811] node 0: slabs: 9, objs: 576, free: 0
[847289.569000] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[847289.569002] cache: anon_vma_chain(1408:crond.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[847289.569002] node 0: slabs: 32, objs: 2048, free: 0
[869770.534745] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.195:64550 ulen 11
[957216.931056] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.232:42388 ulen 11
[967127.413515] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.176:51288 ulen 11
[981863.784401] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.177:34119 ulen 11
[981869.144446] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.177:34119 ulen 11
[982338.578965] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[982338.578972] cache: anon_vma_chain(11612644:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[982338.578974] node 0: slabs: 6, objs: 384, free: 0
[1006409.230101] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[1006409.230111] cache: anon_vma_chain(1408:crond.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[1006409.230113] node 0: slabs: 35, objs: 2240, free: 0
[1030883.849052] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1030887.461019] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1030887.604256] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1190695.900880] httpd: Corrupted page table at address 7f62d5b48e68
[1190695.901070] PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
[1190695.901336] Bad pagetable: 000c [#1] SMP
[1190695.901486] Modules linked in: mpt2sas raid_class mptctl mptbase 8021q garp mrp devlink xt_CHECKSUM tun xt_DSCP xt_TCPMSS xt_state xt_recent xt_owner xt_multiport xt_mac xt_LOG xt_limit xt_length xt_ecn nf_conntrack_irc nf_conntrack_ftp ipt_ULOG ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dell_rbu sunrpc ipmi_ssif iTCO_wdt coretemp iTCO_vendor_support gpio_ich kvm_intel dcdbas kvm ipmi_si ses enclosure ipmi_devintf scsi_transport_sas ipmi_msghandler irqbypass pcspkr sg
[1190695.901549] i5000_edac i5k_amb lpc_ich ip_vs nf_conntrack libcrc32c br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev fuse bridge stp llc binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata megaraid_sas serio_raw bnx2 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop
[1190695.901549] CPU: 5 PID: 627609 Comm: httpd ve: 053a3e76-fa58-4116-9567-97028be293c5 Kdump: loaded Not tainted 3.10.0-1160.53.1.vz7.185.3 #1 185.3
[1190695.901549] Hardware name: Dell Inc. PowerEdge 2950/0H268G, BIOS 2.7.0 10/30/2010
[1190695.901549] task: ffff8bfd19c72000 ti: ffff8bfcbc26c000 task.ti: ffff8bfcbc26c000
[1190695.901549] RIP: 0033:[<00007f62d5888d28>] [<00007f62d5888d28>] 0x7f62d5888d28
[1190695.901549] RSP: 002b:00007f62c6eb2c68 EFLAGS: 00010206
[1190695.901549] RAX: fffffffffffffff5 RBX: 00005575a0197080 RCX: 00007f62d5888d1b
[1190695.901549] RDX: 0000000000000001 RSI: 00007f62d61ac106 RDI: 0000000000008029
[1190695.901549] RBP: 00007f62d61ac106 R08: 00007f62c6eb2c60 R09: 0000000000000000
[1190695.901549] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f62c6eb2cac
[1190695.901549] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
[1190695.901549] FS: 00007f62c6eb3700(0000) GS:ffff8c03ffd40000(0000) knlGS:0000000000000000
[1190695.901549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1190695.901549] CR2: 00007f62d5b48e68 CR3: 0000000199880000 CR4: 00000000000407e0
[1190695.901549]
[1190695.901549] RIP [<00007f62d5888d28>] 0x7f62d5888d28
[1190695.901549] RSP <00007f62c6eb2c68>
There are 5 containers running total ram allocated 27G of 32G on physical machine.
Container 1:
total used free shared buff/cache available
Mem: 2048 379 1020 0 648 1667
Swap: 2048 0 2048
Container 2:
total used free shared buff/cache available
Mem: 4096 284 1525 0 2285 3810
Swap: 2048 0 2048
Container 3:
total used free shared buff/cache available
Mem: 15360 3135 6870 0 5354 12224
Swap: 15360 0 15360
Container 4:
total used free shared buff/cache available
Mem: 2048 236 1589 0 222 1811
Swap: 2048 0 2048
Container 5:
total used free shared buff/cache available
Mem: 4096 305 3509 0 281 3789
Swap: 4096 0 4096
I've attached the full vmcore-dmesg.txt for further review. If there is any further information I can provide please let me know.
Thanks,
Steve
in the vmcore-dmesg.txt file I'm seeing a ton of [689921.681782] SLUB: Unable to allocate memory on node -1 (gfp=0x20) , while it looks like there is more then enough available memory.
There is monitoring in place that captures and logs in the top data every minute, here is the head of top, prior to the reboot
top - 06:03:43 up 13 days, 18:44, 0 users, load average: 3.76, 2.86, 2.17
Tasks: 732 total, 2 running, 730 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19.0 us, 6.3 sy, 0.0 ni, 46.8 id, 27.2 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 32774128 total, 3153448 free, 7168432 used, 22452248 buff/cache
KiB Swap: 67108860 total, 66478156 free, 630704 used. 25131852 avail Mem
Memory on the server was upgraded from 32G to 64G and initally the thought on the errors (same as above) was that there was a problem with the new memory so the old memory was placed back in, which was running fine on OpenVZ 6. After a few days (in top looks like 13 days above) the server has rebooted again and the only change now is the upgrade to OpenVZ 7.
Here is the end of the vmcore-dmesg.txt file:
[689921.681693] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681702] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681706] node 0: slabs: 9, objs: 576, free: 0
[689921.681744] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681746] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681748] node 0: slabs: 9, objs: 576, free: 0
[689921.681763] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681766] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681767] node 0: slabs: 9, objs: 576, free: 0
[689921.681782] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681784] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681786] node 0: slabs: 9, objs: 576, free: 0
[689921.681800] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[689921.681802] cache: anon_vma_chain(7771198:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[689921.681804] node 0: slabs: 9, objs: 576, free: 0
[787524.521853] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788228.484728] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788346.308456] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[788366.582781] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.231:30068 ulen 11
[808605.027142] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[808610.424453] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[808636.354702] UDP: bad checksum. From 135.148.159.207:7777 to 107.161.148.195:24721 ulen 11
[810821.403799] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[810821.403808] cache: anon_vma_chain(9430977:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[810821.403811] node 0: slabs: 9, objs: 576, free: 0
[847289.569000] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[847289.569002] cache: anon_vma_chain(1408:crond.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[847289.569002] node 0: slabs: 32, objs: 2048, free: 0
[869770.534745] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.195:64550 ulen 11
[957216.931056] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.232:42388 ulen 11
[967127.413515] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.176:51288 ulen 11
[981863.784401] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.177:34119 ulen 11
[981869.144446] UDP: bad checksum. From 135.148.159.205:7777 to 23.94.71.177:34119 ulen 11
[982338.578965] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[982338.578972] cache: anon_vma_chain(11612644:dnf-makecache.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[982338.578974] node 0: slabs: 6, objs: 384, free: 0
[1006409.230101] SLUB: Unable to allocate memory on node -1 (gfp=0x20)
[1006409.230111] cache: anon_vma_chain(1408:crond.service), object size: 64, buffer size: 64, default order: 0, min order: 0
[1006409.230113] node 0: slabs: 35, objs: 2240, free: 0
[1030883.849052] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1030887.461019] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1030887.604256] UDP: bad checksum. From 135.148.159.205:7777 to 107.161.148.231:50152 ulen 11
[1190695.900880] httpd: Corrupted page table at address 7f62d5b48e68
[1190695.901070] PGD 80000002e92bf067 PUD 1c99c5067 PMD 195015067 PTE 7fffffffb78b680
[1190695.901336] Bad pagetable: 000c [#1] SMP
[1190695.901486] Modules linked in: mpt2sas raid_class mptctl mptbase 8021q garp mrp devlink xt_CHECKSUM tun xt_DSCP xt_TCPMSS xt_state xt_recent xt_owner xt_multiport xt_mac xt_LOG xt_limit xt_length xt_ecn nf_conntrack_irc nf_conntrack_ftp ipt_ULOG ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat iptable_mangle iptable_security iptable_raw ip_set ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dell_rbu sunrpc ipmi_ssif iTCO_wdt coretemp iTCO_vendor_support gpio_ich kvm_intel dcdbas kvm ipmi_si ses enclosure ipmi_devintf scsi_transport_sas ipmi_msghandler irqbypass pcspkr sg
[1190695.901549] i5000_edac i5k_amb lpc_ich ip_vs nf_conntrack libcrc32c br_netfilter veth overlay ip6_vzprivnet ip6_vznetstat ip_vznetstat ip_vzprivnet vziolimit vzevent vzlist vzstat vznetstat vznetdev vzmon vzdev fuse bridge stp llc binfmt_misc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic sr_mod cdrom crct10dif_common ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix libata megaraid_sas serio_raw bnx2 drm_panel_orientation_quirks dm_mirror dm_region_hash dm_log dm_mod pio_kaio pio_nfs pio_direct pfmt_raw pfmt_ploop1 ploop
[1190695.901549] CPU: 5 PID: 627609 Comm: httpd ve: 053a3e76-fa58-4116-9567-97028be293c5 Kdump: loaded Not tainted 3.10.0-1160.53.1.vz7.185.3 #1 185.3
[1190695.901549] Hardware name: Dell Inc. PowerEdge 2950/0H268G, BIOS 2.7.0 10/30/2010
[1190695.901549] task: ffff8bfd19c72000 ti: ffff8bfcbc26c000 task.ti: ffff8bfcbc26c000
[1190695.901549] RIP: 0033:[<00007f62d5888d28>] [<00007f62d5888d28>] 0x7f62d5888d28
[1190695.901549] RSP: 002b:00007f62c6eb2c68 EFLAGS: 00010206
[1190695.901549] RAX: fffffffffffffff5 RBX: 00005575a0197080 RCX: 00007f62d5888d1b
[1190695.901549] RDX: 0000000000000001 RSI: 00007f62d61ac106 RDI: 0000000000008029
[1190695.901549] RBP: 00007f62d61ac106 R08: 00007f62c6eb2c60 R09: 0000000000000000
[1190695.901549] R10: 0000000000000000 R11: 0000000000000202 R12: 00007f62c6eb2cac
[1190695.901549] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
[1190695.901549] FS: 00007f62c6eb3700(0000) GS:ffff8c03ffd40000(0000) knlGS:0000000000000000
[1190695.901549] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1190695.901549] CR2: 00007f62d5b48e68 CR3: 0000000199880000 CR4: 00000000000407e0
[1190695.901549]
[1190695.901549] RIP [<00007f62d5888d28>] 0x7f62d5888d28
[1190695.901549] RSP <00007f62c6eb2c68>
There are 5 containers running total ram allocated 27G of 32G on physical machine.
Container 1:
total used free shared buff/cache available
Mem: 2048 379 1020 0 648 1667
Swap: 2048 0 2048
Container 2:
total used free shared buff/cache available
Mem: 4096 284 1525 0 2285 3810
Swap: 2048 0 2048
Container 3:
total used free shared buff/cache available
Mem: 15360 3135 6870 0 5354 12224
Swap: 15360 0 15360
Container 4:
total used free shared buff/cache available
Mem: 2048 236 1589 0 222 1811
Swap: 2048 0 2048
Container 5:
total used free shared buff/cache available
Mem: 4096 305 3509 0 281 3789
Swap: 4096 0 4096
I've attached the full vmcore-dmesg.txt for further review. If there is any further information I can provide please let me know.
Thanks,
Steve