Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Fix Version/s: CU-2.6.32-042stab112.X
-
Component/s: Containers::Kernel
-
Security Level: Public
-
Environment:Latest OpenVZ on latest CentOS 6.
[root@kir-ovz2 ~]# rpm -q vzkernel vzctl
vzkernel-2.6.32-042stab108.8.x86_64
vzctl-4.9.4-1.el6.x86_64
Using shared ploop base as described in http://lists.openvz.org/pipermail/users/2015-July/006360.html
-
Fixed in Build:vzkernel-2.6.32-042stab112.3
Description
I am making an experiment of starting many "ploop cloned" containers. I create thousands of containers and then start them sequentially, one by one.
When I use centos-6-x86_64 as a base, "vzctl start" execution time grows slowly as number of containers increase. With about 7000 containers running, vzctl start time is within 3 seconds. It's all good. Here are start times of the last 5 containers:
real 0m2.866s
real 0m2.760s
real 0m2.739s
real 0m2.784s
real 0m2.956s
real 0m2.914s
real 0m2.854s
real 0m2.751s
real 0m2.747s
real 0m2.774s
When I use centos-7-x86_64, vzctl start execution time very quickly degrades from 3 to more than 30 seconds. With 7000 centos-6 CTs already running, here are the first start times of first 70 centos-7 CTs:
real 0m2.665s
real 0m2.738s
real 0m6.225s
real 0m7.658s
real 0m3.458s
real 0m4.273s
real 0m3.513s
real 0m4.439s
real 0m5.520s
real 0m5.962s
real 0m6.097s
real 0m7.148s
real 0m6.095s
real 0m5.995s
real 0m7.492s
real 0m4.716s
real 0m8.279s
real 0m7.437s
real 0m8.945s
real 0m9.733s
real 0m13.413s
real 0m12.580s
real 0m10.696s
real 0m12.177s
real 0m13.113s
real 0m12.328s
real 0m10.237s
real 0m9.875s
real 0m14.723s
real 0m15.889s
real 0m11.858s
real 0m18.624s
real 0m19.192s
real 0m18.001s
real 0m20.406s
real 0m20.513s
real 0m21.672s
real 0m19.742s
real 0m21.258s
real 0m24.803s
real 0m22.313s
real 0m28.477s
real 0m23.245s
real 0m28.288s
real 0m27.218s
real 0m14.768s
real 0m26.165s
real 0m23.766s
real 0m28.593s
real 0m19.163s
real 0m27.795s
real 0m22.115s
real 0m19.446s
real 0m28.385s
real 0m25.968s
real 0m33.637s
real 0m30.093s
real 0m30.876s
real 0m36.643s
real 0m29.594s
real 0m31.062s
real 0m32.633s
real 0m32.011s
real 0m34.969s
real 0m36.008s
real 0m31.761s
real 0m33.192s
real 0m38.187s
real 0m43.265s
real 0m39.537s
real 0m45.822s
This is at least 10x speed degradation, makes me think there's something wrong with centos-7 containers.
Also, the system is becoming extremely sluggish on simple operations like entering a character to a bash prompt.
This is how top looks like when starting centos-7 containers:
top - 19:47:29 up 3 days, 5:18, 3 users, load average: 1737.71, 2111.78, 1734.63
Tasks: 142194 total, 1050 running, 141136 sleeping, 0 stopped, 8 zombie
Cpu(s): 0.1%us, 1.0%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 131786220k total, 128947936k used, 2838284k free, 312972k buffers
Swap: 4194296k total, 4194296k used, 0k free, 28439228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
180300 root 20 0 118m 102m 824 R 25.7 0.1 0:19.91 top
67 root 20 0 0 0 0 R 21.5 0.0 3:15.10 events/0
74 root 20 0 0 0 0 R 20.1 0.0 3:49.59 events/7
71 root 20 0 0 0 0 R 18.1 0.0 3:58.63 events/4
179 root 20 0 0 0 0 R 17.2 0.0 31:36.23 kswapd0
73 root 20 0 0 0 0 R 15.8 0.0 3:47.76 events/6
69 root 20 0 0 0 0 R 15.4 0.0 4:03.51 events/2
70 root 20 0 0 0 0 R 15.2 0.0 3:14.07 events/3
68 root 20 0 0 0 0 R 14.9 0.0 3:50.70 events/1
72 root 20 0 0 0 0 R 14.7 0.0 2:52.83 events/5
78 root 20 0 0 0 0 S 14.7 0.0 3:50.82 events/11
80 root 20 0 0 0 0 R 13.3 0.0 3:40.46 events/13
59 root RT 0 0 0 0 S 13.0 0.0 5:07.02 migration/14
39 root RT 0 0 0 0 S 12.7 0.0 5:08.76 migration/9
75 root 20 0 0 0 0 S 12.7 0.0 3:29.42 events/8
7 root RT 0 0 0 0 S 12.6 0.0 7:42.39 migration/1
11 root RT 0 0 0 0 S 12.1 0.0 10:58.69 migration/2
81 root 20 0 0 0 0 S 12.1 0.0 3:28.59 events/14
82 root 20 0 0 0 0 S 11.8 0.0 3:22.99 events/15
153791 root 20 0 54036 3252 1940 D 11.7 0.0 2:08.43 systemd
79 root 20 0 0 0 0 R 11.5 0.0 3:17.82 events/12
152775 root 20 0 54036 3232 1908 D 11.4 0.0 2:19.95 systemd
158454 root 20 0 54036 3320 2004 R 11.1 0.0 2:02.55 systemd
153918 root 20 0 54036 3268 1948 D 11.0 0.0 2:13.83 systemd
152111 root 20 0 54036 3324 2004 D 10.6 0.0 1:56.05 systemd
157342 root 20 0 54036 3532 2212 R 10.6 0.0 1:42.83 systemd
153416 root 20 0 54036 3168 1852 D 10.5 0.0 2:09.37 systemd
76 root 20 0 0 0 0 R 10.4 0.0 3:23.14 events/9
151825 root 20 0 54036 3264 1952 D 10.2 0.0 2:08.52 systemd
158664 root 20 0 54036 3316 1996 R 10.1 0.0 1:36.93 systemd
162425 root 20 0 54036 3684 2364 D 9.6 0.0 1:15.52 systemd
173442 root 20 0 54036 3628 2360 D 9.5 0.0 0:33.57 systemd
167763 root 20 0 54036 3684 2364 D 9.4 0.0 1:06.11 systemd
I tried disabling CT fsync (echo 0 > /proc/sys/fs/fsync-enable), didn't make much difference (since we're on ploop I guess).
This is what ioacct shows for centos-6 and centos-7 containers started:
centos-6:
[root@kir-ovz2 ~]# cat /proc/bc/7999/ioacct
read 25571328
write 1212416
dirty 1220608
cancel 8192
missed 0
syncs_total 0
fsyncs_total 0
fdatasyncs_total 0
range_syncs_total 0
syncs_active 0
fsyncs_active 0
fdatasyncs_active 0
range_syncs_active 0
io_pbs 0
fuse_requests 0
fuse_bytes 0
centos-7:
[root@kir-ovz2 ~]# cat /proc/bc/70001/ioacct
read 42647552
write 2465792
dirty 2478080
cancel 12288
missed 0
syncs_total 0
fsyncs_total 1
fdatasyncs_total 0
range_syncs_total 0
syncs_active 0
fsyncs_active 0
fdatasyncs_active 0
range_syncs_active 0
io_pbs 0
fuse_requests 0
fuse_bytes 0
When I use centos-6-x86_64 as a base, "vzctl start" execution time grows slowly as number of containers increase. With about 7000 containers running, vzctl start time is within 3 seconds. It's all good. Here are start times of the last 5 containers:
real 0m2.866s
real 0m2.760s
real 0m2.739s
real 0m2.784s
real 0m2.956s
real 0m2.914s
real 0m2.854s
real 0m2.751s
real 0m2.747s
real 0m2.774s
When I use centos-7-x86_64, vzctl start execution time very quickly degrades from 3 to more than 30 seconds. With 7000 centos-6 CTs already running, here are the first start times of first 70 centos-7 CTs:
real 0m2.665s
real 0m2.738s
real 0m6.225s
real 0m7.658s
real 0m3.458s
real 0m4.273s
real 0m3.513s
real 0m4.439s
real 0m5.520s
real 0m5.962s
real 0m6.097s
real 0m7.148s
real 0m6.095s
real 0m5.995s
real 0m7.492s
real 0m4.716s
real 0m8.279s
real 0m7.437s
real 0m8.945s
real 0m9.733s
real 0m13.413s
real 0m12.580s
real 0m10.696s
real 0m12.177s
real 0m13.113s
real 0m12.328s
real 0m10.237s
real 0m9.875s
real 0m14.723s
real 0m15.889s
real 0m11.858s
real 0m18.624s
real 0m19.192s
real 0m18.001s
real 0m20.406s
real 0m20.513s
real 0m21.672s
real 0m19.742s
real 0m21.258s
real 0m24.803s
real 0m22.313s
real 0m28.477s
real 0m23.245s
real 0m28.288s
real 0m27.218s
real 0m14.768s
real 0m26.165s
real 0m23.766s
real 0m28.593s
real 0m19.163s
real 0m27.795s
real 0m22.115s
real 0m19.446s
real 0m28.385s
real 0m25.968s
real 0m33.637s
real 0m30.093s
real 0m30.876s
real 0m36.643s
real 0m29.594s
real 0m31.062s
real 0m32.633s
real 0m32.011s
real 0m34.969s
real 0m36.008s
real 0m31.761s
real 0m33.192s
real 0m38.187s
real 0m43.265s
real 0m39.537s
real 0m45.822s
This is at least 10x speed degradation, makes me think there's something wrong with centos-7 containers.
Also, the system is becoming extremely sluggish on simple operations like entering a character to a bash prompt.
This is how top looks like when starting centos-7 containers:
top - 19:47:29 up 3 days, 5:18, 3 users, load average: 1737.71, 2111.78, 1734.63
Tasks: 142194 total, 1050 running, 141136 sleeping, 0 stopped, 8 zombie
Cpu(s): 0.1%us, 1.0%sy, 0.0%ni, 98.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 131786220k total, 128947936k used, 2838284k free, 312972k buffers
Swap: 4194296k total, 4194296k used, 0k free, 28439228k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
180300 root 20 0 118m 102m 824 R 25.7 0.1 0:19.91 top
67 root 20 0 0 0 0 R 21.5 0.0 3:15.10 events/0
74 root 20 0 0 0 0 R 20.1 0.0 3:49.59 events/7
71 root 20 0 0 0 0 R 18.1 0.0 3:58.63 events/4
179 root 20 0 0 0 0 R 17.2 0.0 31:36.23 kswapd0
73 root 20 0 0 0 0 R 15.8 0.0 3:47.76 events/6
69 root 20 0 0 0 0 R 15.4 0.0 4:03.51 events/2
70 root 20 0 0 0 0 R 15.2 0.0 3:14.07 events/3
68 root 20 0 0 0 0 R 14.9 0.0 3:50.70 events/1
72 root 20 0 0 0 0 R 14.7 0.0 2:52.83 events/5
78 root 20 0 0 0 0 S 14.7 0.0 3:50.82 events/11
80 root 20 0 0 0 0 R 13.3 0.0 3:40.46 events/13
59 root RT 0 0 0 0 S 13.0 0.0 5:07.02 migration/14
39 root RT 0 0 0 0 S 12.7 0.0 5:08.76 migration/9
75 root 20 0 0 0 0 S 12.7 0.0 3:29.42 events/8
7 root RT 0 0 0 0 S 12.6 0.0 7:42.39 migration/1
11 root RT 0 0 0 0 S 12.1 0.0 10:58.69 migration/2
81 root 20 0 0 0 0 S 12.1 0.0 3:28.59 events/14
82 root 20 0 0 0 0 S 11.8 0.0 3:22.99 events/15
153791 root 20 0 54036 3252 1940 D 11.7 0.0 2:08.43 systemd
79 root 20 0 0 0 0 R 11.5 0.0 3:17.82 events/12
152775 root 20 0 54036 3232 1908 D 11.4 0.0 2:19.95 systemd
158454 root 20 0 54036 3320 2004 R 11.1 0.0 2:02.55 systemd
153918 root 20 0 54036 3268 1948 D 11.0 0.0 2:13.83 systemd
152111 root 20 0 54036 3324 2004 D 10.6 0.0 1:56.05 systemd
157342 root 20 0 54036 3532 2212 R 10.6 0.0 1:42.83 systemd
153416 root 20 0 54036 3168 1852 D 10.5 0.0 2:09.37 systemd
76 root 20 0 0 0 0 R 10.4 0.0 3:23.14 events/9
151825 root 20 0 54036 3264 1952 D 10.2 0.0 2:08.52 systemd
158664 root 20 0 54036 3316 1996 R 10.1 0.0 1:36.93 systemd
162425 root 20 0 54036 3684 2364 D 9.6 0.0 1:15.52 systemd
173442 root 20 0 54036 3628 2360 D 9.5 0.0 0:33.57 systemd
167763 root 20 0 54036 3684 2364 D 9.4 0.0 1:06.11 systemd
I tried disabling CT fsync (echo 0 > /proc/sys/fs/fsync-enable), didn't make much difference (since we're on ploop I guess).
This is what ioacct shows for centos-6 and centos-7 containers started:
centos-6:
[root@kir-ovz2 ~]# cat /proc/bc/7999/ioacct
read 25571328
write 1212416
dirty 1220608
cancel 8192
missed 0
syncs_total 0
fsyncs_total 0
fdatasyncs_total 0
range_syncs_total 0
syncs_active 0
fsyncs_active 0
fdatasyncs_active 0
range_syncs_active 0
io_pbs 0
fuse_requests 0
fuse_bytes 0
centos-7:
[root@kir-ovz2 ~]# cat /proc/bc/70001/ioacct
read 42647552
write 2465792
dirty 2478080
cancel 12288
missed 0
syncs_total 0
fsyncs_total 1
fdatasyncs_total 0
range_syncs_total 0
syncs_active 0
fsyncs_active 0
fdatasyncs_active 0
range_syncs_active 0
io_pbs 0
fuse_requests 0
fuse_bytes 0