Details
-
Type: Bug
-
Status: Resolved
-
Priority: Major
-
Resolution: Fixed
-
Fix Version/s: Vz7.0-Update-next
-
Component/s: VCMMD (Memory Manager)
-
Security Level: Public
-
Environment:OpenVZ 7, fully YUM updated
Hardware: Various systems
Description
>Description of problem:
After a "yum update" on any of our various OpenVZ 7 nodes we were unable to start or restart VMs or CTs:
[root@node ~]# prlctl start 727
WARNING: You are using a deprecated CLI component that won't be installed by default in the next major release. Please use virsh instead
Starting the CT...
Failed to start the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: vcmmd: failed to register Container: Failed to get VCMMD D-Bus name
vcmmd: failed to unregister Container: Failed to get VCMMD D-Bus name
vcmmd: failed to unregister Container: Failed to get VCMMD D-Bus name
Failed to start the Container
)
Service "vcmmd" was reported as stopped and could not be restarted:
[root@node ~]# systemctl restart vcmmd
Job for vcmmd.service failed because the control process exited with error code. See "systemctl status vcmmd.service" and "journalctl -xe" for details.
Attempts to manually run "vcmmd" in interactive mode produced this output:
[root@node ~]# vcmmd -i
2024-01-25 16:00:46 INFO vcmmd: Started
2024-01-25 16:00:46 INFO vcmmd.config: Loading config from file '/etc/vz/vcmmd.conf'
2024-01-25 16:00:46 INFO vcmmd.host: [redacted]: 67101446144 bytes available for VEs
2024-01-25 16:00:46 ERROR vcmmd.host: [redacted]: Memory cgroup vstorage.slice does not exist
2024-01-25 16:00:46 ERROR vcmmd.ldmgr: Failed to load policy "density": Policy not found
2024-01-25 16:00:46 INFO vcmmd.ldmgr: Switch to fallback policy
2024-01-25 16:00:46 INFO vcmmd.ldmgr: Loaded policy "NoOpPolicy"
2024-01-25 16:00:46 CRITICAL vcmmd: Terminating program due to unhandled exception:
2024-01-25 16:00:46 CRITICAL vcmmd: Traceback (most recent call last):
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/util/threading.py", line 43, in run_with_except_hook
2024-01-25 16:00:46 CRITICAL vcmmd: run_original(*args2, **kwargs2)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib64/python3.6/threading.py", line 864, in run
2024-01-25 16:00:46 CRITICAL vcmmd: self._target(*self._args, **self._kwargs)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policy.py", line 82, in wrapper
2024-01-25 16:00:46 CRITICAL vcmmd: sleep_timeout = f(self, *args, **kwargs)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policy.py", line 323, in ksm_controller
2024-01-25 16:00:46 CRITICAL vcmmd: params = self.get_ksm_params()
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policies/NoOpPolicy.py", line 54, in get_ksm_params
2024-01-25 16:00:46 CRITICAL vcmmd: if self.active_vm < ksm_vms_active_threshold or \
2024-01-25 16:00:46 CRITICAL vcmmd: AttributeError: 'NoOpPolicy' object has no attribute 'active_vm'
CTs and VMs that had been running during the YUM update continued to run, but would fail to start if an attempt was made to restart them.
The only remedy was to roll back to the previous last good version of "vcmmd":
rpm -hUv --force https://download.openvz.org/virtuozzo/releases/openvz-7.0.20-147/x86_64/os/Packages/v/vcmmd-8.0.77-1.vz7.noarch.rpm
>How reproducible:
yum clean all
yum update
prlctl restart <vpsid>
>Host OS:
Fully YUM updated OpenVZ 7. Various kernel versions, as some nodes hadn't been rebooted in a while. Youngest had 17 days of uptime, oldest had 983 days of uptime.
>Guest OS:
Any. Doesn't matter. All VMs and CTs affected, regardless of guest OSs. And we use EL7-EL8 as well as various Debian versions in guest OS's.
>Additional info:
We had a dozen nodes affected in house and a whole bunch more from clients whom we need still to reach out to.
Do you by chance test YUM updates before release? If not, then why not? If yes: Your test procedure could probably need an overhaul.
After a "yum update" on any of our various OpenVZ 7 nodes we were unable to start or restart VMs or CTs:
[root@node ~]# prlctl start 727
WARNING: You are using a deprecated CLI component that won't be installed by default in the next major release. Please use virsh instead
Starting the CT...
Failed to start the CT: PRL_ERR_VZCTL_OPERATION_FAILED (Details: vcmmd: failed to register Container: Failed to get VCMMD D-Bus name
vcmmd: failed to unregister Container: Failed to get VCMMD D-Bus name
vcmmd: failed to unregister Container: Failed to get VCMMD D-Bus name
Failed to start the Container
)
Service "vcmmd" was reported as stopped and could not be restarted:
[root@node ~]# systemctl restart vcmmd
Job for vcmmd.service failed because the control process exited with error code. See "systemctl status vcmmd.service" and "journalctl -xe" for details.
Attempts to manually run "vcmmd" in interactive mode produced this output:
[root@node ~]# vcmmd -i
2024-01-25 16:00:46 INFO vcmmd: Started
2024-01-25 16:00:46 INFO vcmmd.config: Loading config from file '/etc/vz/vcmmd.conf'
2024-01-25 16:00:46 INFO vcmmd.host: [redacted]: 67101446144 bytes available for VEs
2024-01-25 16:00:46 ERROR vcmmd.host: [redacted]: Memory cgroup vstorage.slice does not exist
2024-01-25 16:00:46 ERROR vcmmd.ldmgr: Failed to load policy "density": Policy not found
2024-01-25 16:00:46 INFO vcmmd.ldmgr: Switch to fallback policy
2024-01-25 16:00:46 INFO vcmmd.ldmgr: Loaded policy "NoOpPolicy"
2024-01-25 16:00:46 CRITICAL vcmmd: Terminating program due to unhandled exception:
2024-01-25 16:00:46 CRITICAL vcmmd: Traceback (most recent call last):
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/util/threading.py", line 43, in run_with_except_hook
2024-01-25 16:00:46 CRITICAL vcmmd: run_original(*args2, **kwargs2)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib64/python3.6/threading.py", line 864, in run
2024-01-25 16:00:46 CRITICAL vcmmd: self._target(*self._args, **self._kwargs)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policy.py", line 82, in wrapper
2024-01-25 16:00:46 CRITICAL vcmmd: sleep_timeout = f(self, *args, **kwargs)
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policy.py", line 323, in ksm_controller
2024-01-25 16:00:46 CRITICAL vcmmd: params = self.get_ksm_params()
2024-01-25 16:00:46 CRITICAL vcmmd: File "/usr/lib/python3.6/site-packages/vcmmd/ldmgr/policies/NoOpPolicy.py", line 54, in get_ksm_params
2024-01-25 16:00:46 CRITICAL vcmmd: if self.active_vm < ksm_vms_active_threshold or \
2024-01-25 16:00:46 CRITICAL vcmmd: AttributeError: 'NoOpPolicy' object has no attribute 'active_vm'
CTs and VMs that had been running during the YUM update continued to run, but would fail to start if an attempt was made to restart them.
The only remedy was to roll back to the previous last good version of "vcmmd":
rpm -hUv --force https://download.openvz.org/virtuozzo/releases/openvz-7.0.20-147/x86_64/os/Packages/v/vcmmd-8.0.77-1.vz7.noarch.rpm
>How reproducible:
yum clean all
yum update
prlctl restart <vpsid>
>Host OS:
Fully YUM updated OpenVZ 7. Various kernel versions, as some nodes hadn't been rebooted in a while. Youngest had 17 days of uptime, oldest had 983 days of uptime.
>Guest OS:
Any. Doesn't matter. All VMs and CTs affected, regardless of guest OSs. And we use EL7-EL8 as well as various Debian versions in guest OS's.
>Additional info:
We had a dozen nodes affected in house and a whole bunch more from clients whom we need still to reach out to.
Do you by chance test YUM updates before release? If not, then why not? If yes: Your test procedure could probably need an overhaul.