High Availability and Failover in MarkLogic FAQ | MarkLogic Support

Knowledgebase

108Administration 8App Services 42Errors 146MarkLogic Server 53Performance Tuning

Knowledgebase:

High Availability and Failover in MarkLogic FAQ 28 October 2021 05:42 PM
How does MarkLogic Server's high-availability work in AWS? AWS provides fault tolerance within a geographic region through the use of Availability Zones (AZs) while MarkLogic gives that ability through Local Disk Failover (LDF). If you’re using AWS, the best practice is to place each MarkLogic node/EC2 instance in a different Availability Zone within a single region, where a given data forests is in one AZ (AZ A), while its LDF forest is in a different AZ (AZ B). This way, in the event where access to Availability Zone A is lost, the host in the Availability Zone A will failover to its LDF on the host in Availability Zone B, thereby ensuring high-availability within your MarkLogic cluster. Further reading: MarkLogic Fundamentals - How should I scale out my cluster? MarkLogic Fundamentals - High Availability & False Failovers What triggers failover in MarkLogic Server? Detecting and Reporting Failover Events Should failover be configured for the Security forest? A cluster is not functional without its Security database. Consequently, it’s important to ensure high-availability of the Security database’s forest by configuring failover for that forest. Further reading: How many forests should my Security Database have? Multiple Forests for Security Database Configuring the Security and Auxiliary Databases to use Failover Forests Should my forests have more than one Local Disk Failover forest? High-availability through Local Disk Failover with one LDF forest is designed to allow the cluster to survive the failure of a single host. If you're using AWS, careful forest placement across AWS availability zones can provide high-availability even in the event of an entire availability zone going down. With rare exceptions, additional LDF forests are typically not worth the additional complexity and cost for the vast majority of MarkLogic deployments. If you configure Local Disk Failover with one LDF coupled with Database Replication and Backups, you would have enough copies of your data to survive the failure of a single host to an entire availability zone. Do I still have high-availability post failover? What happens to the data forest? How can I fail back my forests to the way they were? When a failover event occurs, the LDF forest takes over as the acting data forest and the configured data forest will assume the role of the acting LDF forest as soon as it is successfully restarted. At this point, as long as both forests are still available, the cluster continues to be high availability but with forests reversing their originally intended roles. To fail back the forests into the roles they were originally intended, you will need to wait until the acting data forest (the originally intended LDF) and acting LDF (the originally intended data forest) are synchronized, then manually restart the acting data forest/intended LDF. At that point, the acting LDF/intended data forest “fails back” to take over its original role of acting data forest, and the acting data forest/intended LDF will once again assume its original role of acting LDF. In short, failover is automatic, but failing back requires a manual restart of the acting data forest/intended LDF. When failing back, it's very important to wait until the forests are synchronized - if you fail back before the forests are fully synchronized, you'll lose any data in the acting data forests that's yet to be propagated back to the acting LDF/intended data forest. Further reading: Should I flip the failed-over forests to their respective masters?
(12 vote(s)) Helpful Not helpful

Comments (0)