Details
-
Type: Bug
-
Status: Closed
-
Priority: Major
-
Resolution: Fixed
-
Fix Version/s: CU-2.6.32-042stab117.X, OpenVZ-legacy
-
Component/s: Containers::Kernel
-
Security Level: Public
-
Environment:EL6 + vzkernel 042stab120.3 + ZFSonLinux 0.6.5.8-stable
Description
jWe have a problem with one server with the new 042stab120.3 kernel.
Even with VE_PARALLEL=2, the server starts to hang. The console (both tty and ssh vty) starts hanging, becoming unresponsive even for simple Enter.
I've noticed that the stall becomes much worse when anything netlink related is running in the just booting containers. Typically, ip link set up dev lo & ip a add for the venet0 iface sends the whole HW node to a long stall.
We were unable to start more than 3-4 containers on that node with this kernel.
Interestingly enough, there wasn't any problem with our 17 other HW nodes (~1200 CTs). I've checked twice that these HW nodes don't differ neither in SW nor in HW configuration. The only weird thing with other nodes is that iptables are loaded up much more slowly (though I don't have concrete numbers at hand).
There was only vzctl hung task in dmesg buffer, I don't think it points to much:
[ 601.797644] INFO: task vzctl:24089 blocked for more than 120 seconds.
[ 601.797830] Tainted: P -- ------------ 2.6.32-042stab120.3 #1
[ 601.798149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 601.798468] vzctl D ffff88407115b3c0 0 24089 23970 0 0x00000080
[ 601.798472] ffff88400020be08 0000000000000082 ffff88400020bdd0 ffff88207fc04080
[ 601.798475] ffff883ffd439780 000000000112a9fb 0000006df3c32a7c ffff884073736410
[ 601.798479] ffff882000000000 ffff884073736400 0000000100029e26 00000013eef38227
[ 601.798484] Call Trace:
[ 601.798489] [<ffffffff81556016>] __mutex_lock_slowpath+0x96/0x210
[ 601.798494] [<ffffffff81555b3b>] mutex_lock+0x2b/0x50
[ 601.798498] [<ffffffff810ef9ed>] cgroup_kernel_open+0x3d/0x120
[ 601.798503] [<ffffffff810c50d2>] ub_cgroup_init+0x82/0xb0
[ 601.798508] [<ffffffff810c6605>] ? alloc_ub+0xa5/0x100
[ 601.798513] [<ffffffff810c6804>] get_beancounter_byuid+0x114/0x270
[ 601.798519] [<ffffffff810c4f4c>] sys_setluid+0x6c/0xa0
[ 601.798523] [<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
For now we had to roll back to a vulnerable kernel.
I'm able to reproduce this at will.
What can I do to help you debug this?
Even with VE_PARALLEL=2, the server starts to hang. The console (both tty and ssh vty) starts hanging, becoming unresponsive even for simple Enter.
I've noticed that the stall becomes much worse when anything netlink related is running in the just booting containers. Typically, ip link set up dev lo & ip a add for the venet0 iface sends the whole HW node to a long stall.
We were unable to start more than 3-4 containers on that node with this kernel.
Interestingly enough, there wasn't any problem with our 17 other HW nodes (~1200 CTs). I've checked twice that these HW nodes don't differ neither in SW nor in HW configuration. The only weird thing with other nodes is that iptables are loaded up much more slowly (though I don't have concrete numbers at hand).
There was only vzctl hung task in dmesg buffer, I don't think it points to much:
[ 601.797644] INFO: task vzctl:24089 blocked for more than 120 seconds.
[ 601.797830] Tainted: P -- ------------ 2.6.32-042stab120.3 #1
[ 601.798149] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 601.798468] vzctl D ffff88407115b3c0 0 24089 23970 0 0x00000080
[ 601.798472] ffff88400020be08 0000000000000082 ffff88400020bdd0 ffff88207fc04080
[ 601.798475] ffff883ffd439780 000000000112a9fb 0000006df3c32a7c ffff884073736410
[ 601.798479] ffff882000000000 ffff884073736400 0000000100029e26 00000013eef38227
[ 601.798484] Call Trace:
[ 601.798489] [<ffffffff81556016>] __mutex_lock_slowpath+0x96/0x210
[ 601.798494] [<ffffffff81555b3b>] mutex_lock+0x2b/0x50
[ 601.798498] [<ffffffff810ef9ed>] cgroup_kernel_open+0x3d/0x120
[ 601.798503] [<ffffffff810c50d2>] ub_cgroup_init+0x82/0xb0
[ 601.798508] [<ffffffff810c6605>] ? alloc_ub+0xa5/0x100
[ 601.798513] [<ffffffff810c6804>] get_beancounter_byuid+0x114/0x270
[ 601.798519] [<ffffffff810c4f4c>] sys_setluid+0x6c/0xa0
[ 601.798523] [<ffffffff8100b1a2>] system_call_fastpath+0x16/0x1b
For now we had to roll back to a vulnerable kernel.
I'm able to reproduce this at will.
What can I do to help you debug this?
Attachments
Issue Links
- duplicates
-
OVZ-6817 2.6.32-042stab120.3 degradation
- Resolved