Failover and Replication Overview
15 January 2021 07:03 PM
Within a MarkLogic deployment, there can be multiple primary and replica objects. Those objects can be forests in a database, databases in a cluster, nodes in a cluster, and even clusters in a deployment. This article walks through several examples to clarify how all these objects hang together.
Shared-disk vs. Local-disk failover
Shared-disk failover requires a shared filesystem visible to all hosts in a cluster, and involves one copy of a data forest, managed by either its primary host, or its failover host (so forest1, assigned to host1, failing over to host2).
Local-disk failover involves two copies of data in a primary and local disk failover replica forest (sometimes referred to as an "LDF"). Primary hosts manage primary forests, and failover hosts manage the corresponding synchronized LDF (forest1 on host1, failing over to replicaForest1 on host2).
In the same way that you can have multiple copies of data within a cluster (as seen in local-disk failover), you can also have multiple copies of data across clusters as seen in either database replication or flexible replication. Within a replicated environment you'll often see reference to primary/replica databases or primary/replica clusters. So this will often look like forest1 on host1 in cluster1, replicating to forest1 on host1 in cluster2. We can shorten forest names here to c1.h1.f1 and c2.h1.f1. Note that you can have both local disk failover and database replication going at the same time - so on your primary cluster, you'll have c1.h1.f1, as well as failover forest c1.h2.rf1, and your replica cluster will have c2.h1.f1, as well as its own failover forest c2.h2.rf1. All of these forest copies should be synchronized both within a cluster (c1.h1.f1 synced to c1.h2.rf1) and across clusters (c1.h1.f1 synced to c2.h1.f1).
Configured/Intended vs. Acting
At this point we've got two clusters, each with at least two nodes, where each node has at least one forest - so four forest copies, total (bear in mind that databases can have dozens or even hundreds of forests - each with their own failover and replication copies). The "configured" or "intended" arrangement is what your deployment looks like by design, when no failover or any other kind of events have occurred that would require one of the other forest copies to serve as the primary forest. Should a failover event occur, c1.h2.rf1 will transition from the intended LDF to the acting primary, and its host c1.h2 will transition from the intended failover host to the acting primary host. At this point, the intended primary forest c1.h1.f1 and its intended primary host c1.h1 will likely both be offline. Failing back is the process of reverting hosts and forests from their acting arrangement (in this case, acting primary forest c1.h2.rf1 and acting primary host c1.h2), back to their intended arrangement (c1.h1.f1 is both intended and acting primary forest, c1.h1 is both intended and acting primary host).
This distinction between intended vs. acting can even occur at the cluster level, where c1 is the intended/acting primary cluster, and c2 is the intended/acting replica cluster. Should something happen to your primary cluster c1, the intended replica cluster c2 will transition to the acting primary cluster while c1 is offline.