Uploaded image for project: 'OpenVZ'
  1. OpenVZ
  2. OVZ-7030

Online container migration fails with Remote exception I/O operation on closed file

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Fix Version/s: Vz7.0-Update8
    • Component/s: CRIU
    • Security Level: Public

      Description

      >Description of problem:
      https://forum.openvz.org/index.php?t=msg&th=13504

      [root@vz03 ~]# vzmigrate -vvv --online --require-realtime vz04 222
      ...
      2018-05-23 10:30:10.732: Live migration stage started
      2018-05-23 10:30:36.320: Io multiplexer aborted
      2018-05-23 10:30:36.320: 2018-05-23 10:30:36.321: Phaul service failed to live migrate CT
      2018-05-23 10:30:36.320: 2018-05-23 10:30:36.321: error [-73] : Phaul service failed to live migrate CT
      2018-05-23 10:30:36.321: Phaul service failed to live migrate CT
      2018-05-23 10:30:36.321: Phaul failed to live migrate CT (/var/log/phaul.log)
      2018-05-23 10:30:36.322: 2018-05-23 10:30:36.322: cleaning : destroy CT 222
      2018-05-23 10:30:36.372: 2018-05-23 10:30:36.372: cleaning : 'rm' dir : /vz/private/222
      2018-05-23 10:30:36.372: 2018-05-23 10:30:36.372: can not rename : [/vz/private/222] -> [/vz/private/222.ss6sKg]
      2018-05-23 10:30:36.372: 2018-05-23 10:30:36.373: cleaning : 'rmdir' dir : /vz/root/222
      2018-05-23 10:30:36.372: 2018-05-23 10:30:36.373: can not find entry for delete : [/vz/root/222]
      2018-05-23 10:30:37.373: 2018-05-23 10:30:37.373: unlocking 222

      [root@vz03 ~]# tail -20 /var/log/phaul.log
      10:30:33.214: 285170: Notify (post-network-lock)
      10:30:35.283: 285170: Final FS and images sync
      10:30:35.522: 285170: Sending images to target
      10:30:35.524: 285170: Pack
      10:30:35.561: 285170: Add htype images
      10:30:35.812: 285170: Asking target host to restore
      10:30:36.271: 285170: Remote exception
      10:30:36.271: 285170: I/O operation on closed file
      Traceback (most recent call last):
        File "/usr/libexec/phaul/p.haul", line 9, in <module>
          load_entry_point('phaul==0.1', 'console_scripts', 'p.haul')()
        File "/usr/lib/python2.7/site-packages/phaul/shell/phaul_client.py", line 49, in main
          worker.start_migration()
        File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 161, in start_migration
          self.__start_live_migration()
        File "/usr/lib/python2.7/site-packages/phaul/iters.py", line 232, in __start_live_migration
          self.target_host.restore_from_images()
        File "/usr/lib/python2.7/site-packages/phaul/xem_rpc_client.py", line 26, in __call__
          raise Exception(resp[1])
      Exception: I/O operation on closed file

      dst server logs
      [root@vz04 ~]# tail -20 /var/log/phaul-service.log
      10:30:35.562: 817892: Waiting for images to unpack
      10:30:35.813: 817892: Restoring from images
      10:30:35.827: 817892: Starting vzctl restore
      10:30:36.269: 817892: > Restoring the Container ...
      10:30:36.269: 817892: > Mount image: /vz/private/222/root.hdd
      10:30:36.269: 817892: > Container is mounted
      10:30:36.269: 817892: > Setting permissions for image=/vz/private/222/root.hdd
      10:30:36.269: 817892: > (00.000283) Error (criu/util.c:694): Can't read link of fd -404: No such file or directory
      10:30:36.270: 817892: > (00.000295) Error (criu/protobuf.c:77): Unexpected EOF on (null)
      10:30:36.270: 817892: > The restore log was saved in /vz/dump/222/rst-_cQGWZ-18.05.23-10.30/criu_restore.9.log
      10:30:36.270: 817892: > criu exited with rc=17
      10:30:36.270: 817892: > Unmount image: /vz/private/222/root.hdd

      [root@vz04 ~]# tail -20 /vz/dump/222/rst-_cQGWZ-18.05.23-10.30/criu_restore.9.log
      (00.000142) Version: 3.8 (gitid 0)
      (00.000188) Running on vz04.boardreader.com Linux 3.10.0-693.21.1.vz7.47.4 #1 SMP Sat Apr 28 11:48:07 MSK 2018 x86_64
      (00.000237) No inventory.img image
      (00.000283) Error (criu/util.c:694): Can't read link of fd -404: No such file or directory
      (00.000295) Error (criu/protobuf.c:77): Unexpected EOF on (null)

      it was reported to github/criu
      https://github.com/checkpoint-restore/criu/issues/494
      it have attached logs.

      Dmitry Safonov believes it is not criu issue:
      "
      This one looks suspicious and might be the reason of the fail:

      (02.037229) Error (criu/page-xfer.c:379): No parent image found, though parent directory is set: No such file or directory

      Probably, it's a bug in criu integration. libvzctl or something?
      "

        Attachments

          Activity

            People

            Assignee:
            den Denis V. Lunev
            Reporter:
            vvs Vasily Averin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: