Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

Fix long VMware Fault Tolerance failover time with proper vSAN setup, NIC settings, and VM activity checks. Our support team is always here to help you.

Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

If you’re struggling with VMware Fault Tolerance failover time, you’re not alone. Many administrators find themselves dealing with unexpected delaysduringfailovers, especially when vSAN and Fault Tolerance are combined without a proper fault domain configuration. This blog covers all the possible reasons behind longer failover times and how to fix them.

An Overview

Default Fault Domain Configuration Can Affect Failover
Causes of Unplanned Failovers Without Host Crash

Default Fault Domain Configuration Can Affect Failover

It turned out that I used default settings in the vSAN wizard which would create two fault domains and place one node into each, leaving the third node without a fault domain. While this setup does work with both, vSAN and Fault Tolerance, it does cause the failover to take about half a minute.

After creating a third fault domain for the leftover node, Fault Tolerance works just as expected and virtually all problems that came with the long failover time disappear. So if anyone ever encounters a similar problem, consider reviewing your fault domain setup.

This small configuration mistake directly impacted the VMware Fault Tolerance failover time, and fixing it reduced the failover delay significantly.

Causes of Unplanned Failovers Without Host Crash

A Primary or Secondary VM can fail over even though its ESXi host has not crashed. In such cases, virtual machine execution is not interrupted, but redundancy is temporarily lost. To avoid this type of failover, be aware of the scenarios below and take appropriate measures:

Partial Hardware Failure Related to Storage

This problem can arise when access to storage is slow or down for one of the hosts. When this occurs, there are many storage errors listed in the VMkernel log.

Fix: https://bobcares.com/blog/vmware-storage-drs-configuration/ Address your storage-related problems.

Partial Hardware Failure Related to Network

If the logging NIC is not functioning or connections to other hosts through that NIC are down, this can trigger a fault tolerant virtual machine to be failed over so that redundancy can be reestablished.

Fix: Dedicate a separate NIC for both vMotion and FT logging traffic, and perform vMotion only when the VMs are less active, especially on systems affected by hyperthreading vulnerabilities.

Insufficient Bandwidth on the Logging NIC Network

This usually happens because of too many fault tolerant VMs on one host.

Fix:

Distribute FT VM pairs across multiple hosts
Use a 10-Gbit logging network for FT
Verify that the network is low latency

vMotion Failures Due to High VM Activity

If the vMotion migration of a fault tolerant virtual machine fails, it might need to be failed over. This usually occurs when the VM is too active to migrate smoothly.

Fix: Perform vMotion only when the virtual machines are less active.

Excessive Activity on VMFS Volume

File system locking, VM power ons/offs, or multiple vMotions on a single VMFS volume can trigger a failover. A common symptom is multiple SCSI reservation warnings in the VMkernel log.

Fix:

Reduce file system operations
Avoid placing FT-enabled VMs on busy VMFS volumes

Lack of File System Space Prevents Secondary VM Startup

Check whether your / or /vmfs/datasource file systems have available space. If they’re full, you won’t be able to start a new Secondary VM.

Fix: Free up space on the required file systems.

[If needed, Our team is available 24/7 for additional assistance.]

Conclusion

By following these steps and optimizing your setup, you can significantly reduce VMware Fault Tolerance failover time. Ensuring proper network, storage, and VM activity management can help you maintain a high-availability environment without unwanted delays.

Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

Default Fault Domain Configuration Can Affect Failover

Causes of Unplanned Failovers Without Host Crash

Conclusion

Submit a Comment Cancel reply

Subscribe to our newsletter

Footer newsletter

Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

Understanding VMware Fault Tolerance Failover Time: Common Issues and Fixes

Default Fault Domain Configuration Can Affect Failover

Subscribe to our newsletter for the latest updates, news, and features.

Causes of Unplanned Failovers Without Host Crash

Conclusion

Submit a Comment Cancel reply

Footer newsletter