Uploaded image for project: 'OpenVZ'
  1. OpenVZ
  2. OVZ-7063

Active SSH Session to Container is Lost when Suspend/Resume Container



    • Type: Bug
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Fix Version/s: Vz7.0-Update-next
    • Component/s: CRIU
    • Security Level: Public


      >Description of problem:
      taken from

      Dear All,

      I am trying to install OpenVZ on a RedHat 7.5 machine (AWS instance). I have successfully installed it and its utilities (with yum mainly, openvz-release-7.0.8-4.vz7.x86_64) and I am able to start a container, suspend, and resume it. The container launched is Centos 7.

      [root@ip-172-31-7-139 ec2-user_scripts]# uname -r

      When I launch a container "ct1", I can ssh to it and run something simple like the top command "top -d 1" which updates every second. If I suspend the container while I have that active ssh session to ct1, the shell with the top command obviously freezes. My expectation when I resume the container is that the shell and top command should resume, however instead the ssh and tcp connection seem to break on that shell where I ssh'ed to ct1 and I end up getting kicked out of the container when the container is resumed. I can do a new ssh connection to the container after it resumes with no problem. However my expectation is that I shouldn't be losing the ssh connection in the first place, it should just hang during the suspend/resume then continue normally after resume is done. The error I get is "packet_write_wait: Connection to ct1 port 22: Broken pipe".

      I googled the error and tried to adjust some ssh configuration parameters from both the client and server side but none helped, for example I adjusted these:
      - ServerAliveInterval
      - ServerAliveCountMax
      - ClientAliveInterval

      I also tried to adjust some kernel parameters on the hardware node and container as I've seen suggestions on line to do so, but that didn't help either, these are the ones adjusted:
      net.ipv4.tcp_keepalive_time = 7200
      net.ipv4.tcp_keepalive_intvl = 300
      net.ipv4.tcp_keepalive_probes = 100

      Are they any settings to adjust on the hardware node (e.g. kernel parameters) or when launching the container to fix this behavior? Any suggestions are really appreciated.

      This is the summary of the scenario I am describing:

      #Create container
      prlctl create ct1 --vmtype ct
      #setup network, dns, etc..
      #ssh to ct1 and run simple command like "top -d 1"
      #Suspend container while active ssh session is taking place
      prlctl suspend ct1
      #Resume container
      prlctl resume ct1
      #Expectation: ssh session should still be intact after resuming container, instead it gets broken

      Thank you for your help.

      Mohamad Sindi
      Massachusetts Institute of Technology (MIT)




            khorenko Konstantin Khorenko
            vvs Vasily Averin
            0 Vote for this issue
            3 Start watching this issue