Details
-
Type: Bug
-
Status: Closed
-
Priority: Critical
-
Resolution: Fixed
-
Fix Version/s: Vz7.0-Update3
-
Component/s: CRIU
-
Security Level: Public
-
Environment:Linux 3.10.0-327.36.1.vz7.18.7 #1 SMP Tue Oct 11 15:39:22 MSK 2016 x86_64 x86_64 x86_64 GNU/Linux
Description
We run several OVZ host nodes with many cPanel VE's on them. We put up
a few new OVZ7 host nodes with intentions on transitioning over to
that at some point in the future. At first I used the ovztransfer.sh
script to transfer a cPanel VE from old OVZ to OVZ7 which worked ok,
the VE started and operated as expected but when I went to
snapshot/checkpoint the VE I started running in to problems. I cannot
get OVZ7 to snapshot this cPanel VE.
Ok, so I thought perhaps it was a botched copy or something related to
the ovztransfer.sh script since it's not technically an official
script (but is mentioned in the OVZ documentation). I decided to
create a brand new centos7 VE on the OVZ7 hostnode and did a fresh
cPanel install then migrated the cPanel users from old to new server
using typical cPanel fashion (transfer tool). Everything worked a
expected but then I tried to snapshot the new container and again it
is erroring out. I have another container on the same hostnode that I
can snap without a problem.
There definitely seems to be some sort of bug that I'm hitting when
snapshotting this particular cPanel container under OVZ7. It is a
total road block, I cannot continue a transition to ovz7 until I know
that I can checkpoint VE's reliably.
Here are some details:
Hostnode: Virtuozzo Linux release 7.2 / Linux 3.10.0-327.36.1.vz7.18.7
VE: CentOS Linux release 7.2.1511 (Core) - Running a brand new install
of the latest version of cPanel with about ~600 active users recently
migrated to it using cPanel transfer tool / ~250GB of data.
# prlctl snapshot 1035
Creating the snapshot...
PRL_ERR_VZCTL_OPERATION_FAILED (Details: Failed to checkpoint the Container
All dump files and logs were saved to
/vz/private/1035/dump/{ce2b2c58-00ac-4e2f-b4da-e0a0dc594ff4}.fail
Failed tp dump the Container, status pipe unexpectedly closed
Failed to dump Container
Failed to resume Container
Failed to create snapshot
)
Failed to create the snapshot: Unknown
What I think is the relevant info from the dump.log file
(04.143824) Error (criu/sk-inet.c:158): In-flight connection (l) for 924d83
(04.143831) Error (criu/sk-inet.c:160): In-flight connections can be
ignored with the --skip-in-flight option.
(04.143868) Error (criu/cr-dump.c:1322): Dump files (pid: 256864) failed with -1
(04.168423) Error (criu/cr-dump.c:1634): Dumping FAILED.
I ran the snapshot again and got a different error on the second pass
but seems to still be related to sockets in some way:
(04.005642) fdinfo: type: 0x4 flags: 02000002/01 pos: 0 fd: 5
(04.005657) 287702 fdinfo 6: pos: 0 flags: 2000002/0x1
(04.005665) Searching for socket 9888dd (family 2.6)
(04.005675) Error (criu/sk-inet.c:202): Name resolved on unconnected socket
(04.005680) ----------------------------------------
(04.005693) Error (criu/cr-dump.c:1322): Dump files (pid: 287702) failed with -1
(04.005726) Waiting for 287702 to trap
(04.005753) Daemon 287702 exited trapping
(04.005761) Sent msg to daemon 5 0 0
pie: 20365: __fetched msg: 5 0 0
pie: 20365: 20365: new_sp=0x7f27094c8008 ip 0x7f271180c20e
I attached the first dump attempt
>Host OS: Virtuozzo Linux release 7.2
>Guest OS: CentOS Linux release 7.2.1511 (Core)
>Additional Info: The VE has about 6 Ip addresses assigned to it in veid.conf. If I comment out the ips and start the VE I can dump it. If I comment the IPs and replace them with 6 different Ips I can snapshot the container as well, although there may not be any active connections on those IPs at the time of the snapshot.
It could possibly be related to open connections/sockets for Dovcot mail server
a few new OVZ7 host nodes with intentions on transitioning over to
that at some point in the future. At first I used the ovztransfer.sh
script to transfer a cPanel VE from old OVZ to OVZ7 which worked ok,
the VE started and operated as expected but when I went to
snapshot/checkpoint the VE I started running in to problems. I cannot
get OVZ7 to snapshot this cPanel VE.
Ok, so I thought perhaps it was a botched copy or something related to
the ovztransfer.sh script since it's not technically an official
script (but is mentioned in the OVZ documentation). I decided to
create a brand new centos7 VE on the OVZ7 hostnode and did a fresh
cPanel install then migrated the cPanel users from old to new server
using typical cPanel fashion (transfer tool). Everything worked a
expected but then I tried to snapshot the new container and again it
is erroring out. I have another container on the same hostnode that I
can snap without a problem.
There definitely seems to be some sort of bug that I'm hitting when
snapshotting this particular cPanel container under OVZ7. It is a
total road block, I cannot continue a transition to ovz7 until I know
that I can checkpoint VE's reliably.
Here are some details:
Hostnode: Virtuozzo Linux release 7.2 / Linux 3.10.0-327.36.1.vz7.18.7
VE: CentOS Linux release 7.2.1511 (Core) - Running a brand new install
of the latest version of cPanel with about ~600 active users recently
migrated to it using cPanel transfer tool / ~250GB of data.
# prlctl snapshot 1035
Creating the snapshot...
PRL_ERR_VZCTL_OPERATION_FAILED (Details: Failed to checkpoint the Container
All dump files and logs were saved to
/vz/private/1035/dump/{ce2b2c58-00ac-4e2f-b4da-e0a0dc594ff4}.fail
Failed tp dump the Container, status pipe unexpectedly closed
Failed to dump Container
Failed to resume Container
Failed to create snapshot
)
Failed to create the snapshot: Unknown
What I think is the relevant info from the dump.log file
(04.143824) Error (criu/sk-inet.c:158): In-flight connection (l) for 924d83
(04.143831) Error (criu/sk-inet.c:160): In-flight connections can be
ignored with the --skip-in-flight option.
(04.143868) Error (criu/cr-dump.c:1322): Dump files (pid: 256864) failed with -1
(04.168423) Error (criu/cr-dump.c:1634): Dumping FAILED.
I ran the snapshot again and got a different error on the second pass
but seems to still be related to sockets in some way:
(04.005642) fdinfo: type: 0x4 flags: 02000002/01 pos: 0 fd: 5
(04.005657) 287702 fdinfo 6: pos: 0 flags: 2000002/0x1
(04.005665) Searching for socket 9888dd (family 2.6)
(04.005675) Error (criu/sk-inet.c:202): Name resolved on unconnected socket
(04.005680) ----------------------------------------
(04.005693) Error (criu/cr-dump.c:1322): Dump files (pid: 287702) failed with -1
(04.005726) Waiting for 287702 to trap
(04.005753) Daemon 287702 exited trapping
(04.005761) Sent msg to daemon 5 0 0
pie: 20365: __fetched msg: 5 0 0
pie: 20365: 20365: new_sp=0x7f27094c8008 ip 0x7f271180c20e
I attached the first dump attempt
>Host OS: Virtuozzo Linux release 7.2
>Guest OS: CentOS Linux release 7.2.1511 (Core)
>Additional Info: The VE has about 6 Ip addresses assigned to it in veid.conf. If I comment out the ips and start the VE I can dump it. If I comment the IPs and replace them with 6 different Ips I can snapshot the container as well, although there may not be any active connections on those IPs at the time of the snapshot.
It could possibly be related to open connections/sockets for Dovcot mail server