Recreating a Node into an Existing Cluster
08 October 2019 05:44 PM
In some situations an existing cluster node needs to be replaced. There are multiple reasons for this activity like hardware failure or hardware replacement.
In this Knowledgebase article we will outline the steps necessary to replace the node by reusing the existing cluster configuration without registering it again.
Preparation steps for re-joining a node into the cluster
Downloading MarkLogic for the New Host
MarkLogic Server, and the optional MarkLogic Converters and Filters, can be downloaded from the MarkLogic Developer Community, the most recent versions can be found at the following URLS, and will provide you the option of downloading by either https or curl:
If the exact version you are running is not available, you may still be able to download it by getting the download link for the closest current version (8,9 or 10), and editing the minor version number in the link.
So if you need 10.0-1, and the current available version is 10.0-2, when you choose the Download via Curl option, you will get a download link that looks like this:
Update the URL with the minor release version you need:
If you are unable to get the version you need this way, then contact MarkLogic Support.
Rejoining the Replacement Node to the Cluster
There are two methods to rejoin a host into the cluster, depending on the availability of configuration files.
Method 1: Rejoining the Cluster With Existing Configuration Files
This procedure can be only performed if existing configuration files from /var/opt/MarkLogic/*.xml are available from the lost/old node otherwise it will fail causes a lot of problems.
After starting the node it will reuse the existing configuration settings and assume the identity of the missing node.
Method 2: Rejoining the Cluster With Configuration Files From Another Node
This procedure is required if there is no older configuration file set available. For example no file backup was made from /var/opt/MarkLogic/*.xml. It requires manual editing of a configuration file.
As emphasized in the procedures, it is very important to update server.xml and change the <host-id> to match the value defined in hosts.xml and apply the correct license information. Without these changes the node may not start up, may confuse the other nodes, or it may exhibit unexpected behavior.
For both methods, the startup process is the same. MarkLogic will use the configuration files to rejoin the cluster. Forests that no longer exist will automatically be recreated. Existing forests that have been mounted or copied to the correct location, will be mounted like before. Forests configured for local disk failover will automatically start synching with the online forests. If configured, replication will start replicating the forests after the node is started. The forests can also be restored from backup, in case there is no local disk failover, or replication configured.
Regenerating a XDQP Host Certificates
The first step in the process is to check the Certificate to see whether it is valid or not. If you replaced your node using method 1, the certificate is likely to be valid. If you replaced your node using method 2, then the certificate is likely to be invalid.
Log into a terminal on the newly replaced host, and extract the private key from /var/opt/MarkLogic/server.xml and the hosts certificate from /var/opt/MarkLogic/hosts.xml:
Now extract the certificate for the new host from/var/opt/MarkLogic/hosts.xml.
Once you have the private key, and the certificate, you can compare the md5 signatures of the files usingopenssl, to see if they match.
If the values match, STOP HERE. The certificate is valid and does not need to be regenerated. If the values do not match, then the certificate needs to be regenerated.
Make note of the <host-id> from /var/opt/MarkLogic/server.xml. This will be used to populate the value for the Common Name (CN) when the certificate is generated.
Create the new self-signed certificate using the servers private key. Typically these are set to 10 years (3650 days) by default when MarkLogic first runs, but you can choose another value if needed. Use the <host-id> from the previos step as the CN.
Compare the MD5 Checksums with openssl, this time they should match:
Make a copy of hosts.xml to replace the certs, also note the host-id for use in a later step.
Edit /tmp/hosts.xml and replace the old certificate for the host with the new certificate. Find the entry with the correct <host-id> and replace the <ssl-certificate> field with the new certificate in /tmp/new-server.crt
Replace the existing hosts.xml with our updated copy
Restart MarkLogic on the node. This can be done from any host in the cluster, using the Admin Interface, the REST Management API endpoint, or Query Console.
Verify the changes to hosts.xml have propagated to all hosts in the cluster. Check that the hosts.xml is now the same for the hosts in the cluster. One way of doing this is comparing md5 checksums.
You should now be able to set xdqp ssl enabled to true in the group configurations. Check the cluster status page in the Administrative Interface to ensure all the hosts have reconnected successfully, or review the ErrorLog files to ensure there are no SVC-SOCACC errors in the log.
This article explains how to directly replace a node in a cluster by using the same host name. Another way is to add a new node to the cluster and transfer the forests which is explained in the following knowledge base article "Replacing a D-Node with local disk failover".
Some of these steps may differ, such as operating system calls or file system locations. On a different OS, the specific commands will need to be adjusted to match the environment.
Replacing a failed MarkLogic node in a cluster: a step by step walkthrough