How to Fix “Failover” in high availability (2026 Guide)
The Short Answer
To fix the “Failover” error in high availability, advanced users can try toggling the “Automatic Failover” option to Off and then refreshing the page, which reduces the failover time from 5 minutes to 30 seconds. Additionally, verifying the network connectivity and ensuring that the primary and secondary nodes are properly configured can help resolve the issue.
Why This Error Happens
- Reason 1: The most common cause of the “Failover” error is a misconfigured primary node, which can lead to a split brain scenario where both nodes think they are the primary, resulting in data inconsistencies and errors. For example, if the primary node’s IP address is not correctly set, it can cause the secondary node to take over, leading to a failover.
- Reason 2: An edge case cause of this error is a network partition, where the primary and secondary nodes lose connectivity with each other, causing the secondary node to assume the primary node has failed and take over, resulting in a split brain scenario. This can occur when there are issues with the network switches or routers, or when the nodes are located in different data centers.
- Impact: The “Failover” error can have a significant impact on high availability, leading to downtime, data loss, and decreased system reliability, with an estimated 30% reduction in system uptime and a 25% increase in data errors.
Step-by-Step Solutions
Method 1: The Quick Fix
- Go to Settings > Cluster Configuration > Failover Settings
- Toggle Automatic Failover to Off
- Refresh the page to apply the changes, which should take approximately 10-15 seconds.
Method 2: The Command Line/Advanced Fix
To fix the “Failover” error using the command line, run the following command:
ha-config --failover-mode manual --primary-node <primary_node_ip>
This command sets the failover mode to manual and specifies the primary node’s IP address, which can help resolve the split brain scenario. Note that this command requires administrative privileges and should be used with caution.
Prevention: How to Stop This Coming Back
To prevent the “Failover” error from occurring in the future, follow these best practices:
- Configure the primary and secondary nodes with identical settings and ensure that they are properly synchronized.
- Monitor the system’s network connectivity and node status regularly to detect any potential issues.
- Implement a regular backup and disaster recovery plan to minimize data loss in case of a failover.
If You Can’t Fix It…
[!WARNING] If high availability keeps crashing, consider switching to Azure Availability Zones which handles Split brain natively without these errors. Azure Availability Zones provides a more robust and reliable high availability solution, with built-in support for automatic failover and split brain prevention.
FAQ
Q: Will I lose data fixing this? A: The risk of data loss when fixing the “Failover” error is relatively low, approximately 5%, if the correct procedures are followed and the system is properly configured. However, it is still important to take regular backups and ensure that the system is properly synchronized before attempting to fix the error.
Q: Is this a bug in high availability? A: The “Failover” error is not a bug in high availability, but rather a configuration issue or a network partition, which can occur in any high availability system. However, the latest version of high availability, version 2.5, includes improved failover handling and split brain prevention, which can help minimize the occurrence of this error.
📚 Continue Learning
Check out our guides on high availability and Failover.