Knowledgebase: MarkLogic Server
Database available without a quorum?
21 May 2020 08:22 AM

Introduction

In the Scalability, Availabilty & Failover Guide, the node communication section describes a quorum as >50% of the nodes in a cluster.

Is it possible for a database to be available for reads and writes, even if a quorum of nodes is not available in the cluster?

The answer is yes, there are configurations and sequences of events that can lead to forests remaining online when there are fewer than 50% of the hosts being online.

Details

If a single forest in a database is not available, the database is not be accessible. It is also true that as long as all of a database's forests are available in the cluster, the database will be available for reads and writes regardless of any quorum issues.

Of course, the Security database must also be available in the cluster for the cluster to function.

Forest Availability: Simple Case

In the simplest case, if you have a forest that is not configured with either local disk failover or shared disk failover and as long as the forest's host is online and exists in the cluster, the forest will be available regardless of any quorum issues.

To explain this case in more detail: if we have a 3-node MarkLogic cluster containing 3 hosts (let's call them host-a, host-b and host-c); if we were to then initialize host-a as the primary host (so this is the first host is set up in the cluster and is the host containing the master security database) and we then join host-b and host-c to host-a to complete the cluster. 

Shortly after that, if we shut both the joiner hosts (host-b and host-c) down, so only host host-a remained online, we would see a chain of messages in the primary host's ErrorLog that indicated there was no longer quorum within the cluster:

2020-05-21 01:19:14.632 Info: Detected quorum (3 online, 1 suspect, 0 offline)
2020-05-21 01:19:18.570 Warning: Detected suspect quorum (3 online, 2 suspect, 0 offline)
2020-05-21 01:19:29.715 Info: Disconnecting from domestic host host-b.example.marklogic.com because it has not responded for 30 seconds.
2020-05-21 01:19:29.715 Info: Disconnected from domestic host host-b.example.marklogic.com
2020-05-21 01:19:29.715 Info: Detected suspect quorum (2 online, 1 suspect, 1 offline)
2020-05-21 01:19:33.668 Info: Disconnecting from domestic host host-c.example.marklogic.com because it has not responded for 30 seconds.
2020-05-21 01:19:33.668 Info: Disconnected from domestic host host-c.example.marklogic.com
2020-05-21 01:19:33.668 Warning: Detected no quorum (1 online, 0 suspect, 2 offline)

Under these circumstances, we would be able to access the host's admin GUI on port 8001 and it would respond without issue.  We would be able to access Query Console on that host on port 8000 and would be able to inspect the primary host's databases.  We would also be able to access the Monitoring History on port 8002 - all directly from the primary host.

In this scenario, because the primary host remains online and the joining hosts are offline; and because we have not yet set up failover anywhere, there is no requirement for quorum, so host-a remains accessible.

If host-a also happened to have a database with forests that only resided on that host, these would be available for queries at this time.  However, this is a fairly limited use case because in general, if you have a 3-node cluster, you would have a database whose forests reside on all three hosts in the cluster with failover forests configured on alternating hosts. 

As soon as you do this, if you lose one host and you don't have failover configured, the database would now become unavailable (due to a crucial forest being offline) and if you had failover forests configured, you would still be able to access the database on the remaining two hosts.

However, if you then shut down another host, you would lose quorum (which is a requirement for failover).

Forest Availability: Local Disk Failover

For forests configured for local disk failover, the sequence of events is important:

In response to a host failure that makes an "open" forest inaccessible, the forest will failover to the configured forest replica as long as a quorum exists and the configured replica forest was in the "sync replicating" state. In this case, the configured replica forest will transition to the "open" state; the configured replica forest becomes the acting master forest and is available to the database for both reads and writes.

Additionally, an "open" forest will not go offline in response to another host being evicted from the cluster.

However, once cluster quorum is lost, forest failovers will no longer occur.

Conclusion

Depending on how your forests are distributed in the cluster and depending of the order of host failures, it is possible that a database can remain online even when there is no longer a quorum of hosts in the cluster.

Of course, databases with many forests spread across many hosts typically can't stay online if you lose quorum because some forest(s) will become unavailable.

Recommendation

Even though it is possible to have a functioning cluster with less than a quorum of hosts online, you should not architect your high availability solution to depend on it.

(11 vote(s))
Helpful
Not helpful

Comments (0)