Got in this morning and without warning a host in one of our Xen Clusters fails. Can’t ssh into it, no console yet XenCenter still thinks it’s responding. Attempts to reboot it or migrate a VM off of it time out. So a hard power off of the host and reboot. Boots up but returns without any network interfaces. So, it’s decided to just leave it off and let the other hosts handle the VMs. Plenty of redundancy and resources. There goes my morning. Anyway, here’s the fix:
Ssh into cluster master (assuming the host that went down wasn’t the master. If it was you have bigger problems.)
Find the failed host uuid.
then see what VMs were running on it.
xe vm-list resident-on=<UUID> is-control-domain=false
For each VM that is in purgatory, force poweroff
xe vm-reset-powerstate uuid=<UUID> –force
See what disk are attached
xe vm-disk-list vm=<VMNAME>
And the SR (storage repo) they are on
xe vdi-list uuid=<UUID of SR>
At this point you could restart the VM in Xen but likely you will get a “VDI is not available” error, so forget the offending VDI(s)
xe vdi-forget uuid=<UUID>
Scan the SR to find them
xe sr-scan uuid=<UUID of SR>
After you rescan the SR, you will/should have blank VDIs with no info. From previous commands you can re-populate names and description.
You likely will have to fsck the disks, You could attach them to a utiliity VM and fsck them, but usually they will fsck fine on reboot.