Should I flip failed over forests back to their respective masters? What are the risks if I leave them?
29 August 2018 11:27 AM
The notion of "flipping" back control (from failed-over replica forest back to the master forest) has been covered in previous Knowledgebase articles:
In this Knowledgebase article, we will discuss the pros and cons of leaving failed over forests as they are. Should control be returned to the master forests after a failover event?
Can it be considered good practice to leave forests in their failed-over state?
As long as the original configured master shows that it is in sync replicating state in the database status page, you know it's still ready to take over in the event that the configured replica (acting master) fails at a later time; this means that High Availability is still preserved across the cluster in spite of a prior failover event having taken place.
In summary, the main reasons to fail back the forests to their initial configured state are as follows:
In the event of a forest failover, as long as your previous master forests are in their (expected) sync replicating state, the risk of leaving the forest in a failed over state is minimal; any disturbance that takes the active master forest offline (such as a forest restart) will cause failover to happen again so you still continue to have High Availability.
However, forest failover can be indicative of a larger symptom: a particular host that appears to be encountering issues for any number of possible reasons. Keeping track of when forests fail over for a given host can be a useful first line of enquiry into a system that is showing early warning signs of a problem.
From the perspective of system management, flipping failed-over forests back to their respective masters could be considered as part of an ongoing approach to managing and maintaining general cluster health.
In the event of a failover, if the failover details are logged, the forests are failed back to their respective masters, subsequent failover events should become more apparent at a glance; it's easy to quickly review the status tab of a given database to confirm that all the master forests are in their open state (with their replica forests all sync replicating).
Adopting a policy of logging what happened and resolving the issue by failing the forests back makes the procedure of managing a failover an event that gets triaged and in the longer run will make future events easier to spot and - potentially - could provide data to give you advance warning of an inherent issue involving a given host in your cluster.