We have an oVirt cluster setup using Oracle VM Virtualization Manager v4.5.5-1.20.el8 on 4 physical nodes with an external storage.
We want to have Highly Available VMs, which can handle minimal possible downtime.
oVirt docs says that the suitable approach for provisioning a VM with HA config enabled is to use the KILL
resume behaviour, where a VM in PAUSED state is going to be killed sooner or later to schedule it on another host. But we want to avoid this situation, instead use AUTO RESUME
to minimize downtime.
Test Simulation
I have simulated an instantaneous crash scenario by manually turning the machine off the hypervisor host running an RHEL VM. The guest VM suffers ping loss of around 2-2:30 minutes until the host was up.
Here is the configaration:
- Target Storage Domain for VM Lease -> Selected boot storage
- Resume Behavior -> KILL (no other option available when HA option is selected)
- Priority for Migration -> Low
Expectation
We want the VM to be resumed automatically for similar disaster scenarios (host unavailable / storage I/O error) in the same / different host without getting it killed and restarted.
What would be the most suitable solution to this?