Knowledgebase: Administration
Recreating a Node into an Existing Cluster
08 October 2019 05:44 PM

Introduction

In some situations an existing cluster node needs to be replaced. There are multiple reasons for this activity like hardware failure or hardware replacement.

In this Knowledgebase article we will outline the steps necessary to replace the node by reusing the existing cluster configuration without registering it again.

Important notes:

  • The replacement node must have the same architecture as all other nodes of the cluster (e.g., Windows, Linux, Solaris). The CPUs must also have the same number of bits (e.g., 64, 32).
  • The replacement node must have the same (or higher) count of CPU cores
  • The replacement node must have the same (or higher) allocated disk space and mount points as the old node
  • The replacement node must have the same hostname as the old node, unless the node is an AWS EC2 instance using MARKLOGIC_EC2=1(default when using MarkLogic AMIs)

Preparation steps for re-joining a node into the cluster

  • Install and configure the operating system
    • make sure the mount points are matching the old setup
    • in case the previous storage is healthy it can be reused (forests located on it will be mounted)
  • For any non-MarkLogic data (such as XQuery modules, Deployment scripts etc.) required to run on this node, ensure these are manually zipped and copied over as part of the staging process
  • Copy over MarkLogic configuration files (/var/opt/MarkLogic/*.xml) from a backup of the old node
    • If xdqp ssl enabled is set to true, change the setting to false.  If you can’t do this through the Admin UI, you can manually update the value of xdqp-ssl-enabled to false.
    • To re-enable ssl for xdqp connections once the node has rejoined the cluster, you will need to regenerate the replacement host certificate.  Follow the instructions in theRegenerating a XDQP Host Certificatessection of this article.

Downloading MarkLogic for the New Host

MarkLogic Server, and the optional MarkLogic Converters and Filters, can be downloaded from the MarkLogic Developer Community, the most recent versions can be found at the following URLS, and will provide you the option of downloading by either https or curl:

If the exact version you are running is not available, you may still be able to download it by getting the download link for the closest current version (8,9 or 10), and editing the minor version number in the link.

So if you need 10.0-1, and the current available version is 10.0-2, when you choose the Download via Curl option, you will get a download link that looks like this:

https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-2-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

Update the URL with the minor release version you need:

https://developer.marklogic.com/download/binaries/10.0/MarkLogic-10.0-1-amd64.msi?t=SomeHashValue/1&email=myemail%40mycompany.com

If you are unable to get the version you need this way, then contact MarkLogic Support.

Rejoining the Replacement Node to the Cluster

There are two methods to rejoin a host into the cluster, depending on the availability of configuration files.

  1. Using an older set of configuration files from the node being replaced
  2. Creating a new set of configuration files from another node in the cluster

Method 1: Rejoining the Cluster With Existing Configuration Files

This procedure can be only performed if existing configuration files from /var/opt/MarkLogic/*.xml are available from the lost/old node otherwise it will fail causes a lot of problems.

  • Perform a standard MarkLogic server installation on the new target node
    • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
    • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
    • Verify local configuration settings in/etc/marklogic.conf (optional)
    • Do not start MarkLogic server
  • Create a new data directory
    • $ mkdir /var/opt/MarkLogic (default location; might already exist if this separate mount point)
    • Verify ownership of the data directory, daemon.daemon by default.
      • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
  • Copy an existing set of configuration files into the data directory
    • $ cp /path/to/old/config/*.xml /var/opt/MarkLogic
    • Verify ownership of the configuration files, daemon.daemon by default.
      • To fix: $ chown daemon:daemon /var/opt/MarkLogic/*.xml
  • Perform a last sanity check
    • Hostname must be the same as the old node, except for AWS EC2 nodes as mentioned above
    • Verify firewall or Security Group rules are correct
    • Verify mount points, file ownership and permissions are correct
  • Start MarkLogic
    • $ service MarkLogic start
  • Monitor the startup process

After starting the node it will reuse the existing configuration settings and assume the identity of the missing node. 

Method 2: Rejoining the Cluster With Configuration Files From Another Node

This procedure is required if there is no older configuration file set available. For example no file backup was made from /var/opt/MarkLogic/*.xml. It requires manual editing of a configuration file.  

  • Perform a standard MarkLogic server installation on the new target node
    • $ rpm -Uvh /path/to/MarkLogic-<version>.x86_64.rpm or yum install /path/to/MarkLogic-<version>.x86_64.rpm
    • $ rpm -Uvh /path/to/MarkLogicConverters-<version>.x86_64.rpm or yum install /path/to/MarkLogicConverters-<version>.x86_64.rpm (optional)
    • Verify local configuration settings in /etc/marklogic.conf (optional)
  • Start MarkLogic, and perform a normal server setup as a single node. DO NOT join the cluster now.
    • $ service MarkLogic start
    • Perform a basic setup
    • DO NOT join the host to the cluster!
  • Stop MarkLogic, and move current configuration files in /var/opt/MarkLogic to a new location
    • $ service stop MarkLogic
    • $ mv /var/opt/MarkLogic/*.xml/some/place
  • Copy a configuration files set from one of the other nodes over
    • $ scp <othernode>:/var/opt/MarkLogic/*.xml /var/opt/MarkLogic
    • Verify ownership of the data directory, daemon.daemon by default.
      • To fix: $ chown -R daemon:daemon /var/opt/MarkLogic
  • Make note of the <host-id> for the node be recreated in hosts.xml
    • $ grep -B1 hostname /var/opt/MarkLogic/hosts.xml
  • Edit /var/opt/MArkLogic/server.xml **Note: This step is critically important to ensure correct operation of the cluster.
    • Use a UTF-8 safe editor like nano or vi
    • Update <host-id> with the value found in/var/opt/MarkLogic/hosts.xml
    • Update <license-key> value if necessary.
    • Update <licensee> value if necessary.
    • Save the changes
  • Perform a last sanity check
    • <host-id> must match the <host> defined in hosts.xml.
      • Important: host will not start if these values do not match 
    • Hostname must be the same as the old node, unless the node is an AWS EC2 instance using the configuration option MARKLOGIC_EC2=1, which is the default when using the MarkLogic provided AMIs.
    • Firewall or Security Group rules are correct
    • Mount points, ownership and permissions are correct
  • Start MarkLogic and monitor the startup process

As emphasized in the procedures, it is very important to update server.xml and change the <host-id> to match the value defined in hosts.xml and apply the correct license information. Without these changes the node may not start up, may confuse the other nodes, or it may exhibit unexpected behavior.

Wrapping Up

For both methods, the startup process is the same. MarkLogic will use the configuration files to rejoin the cluster. Forests that no longer exist will automatically be recreated. Existing forests that have been mounted or copied to the correct location, will be mounted like before. Forests configured for local disk failover will automatically start synching with the online forests.  If configured, replication will start replicating the forests after the node is started. The forests can also be restored from backup, in case there is no local disk failover, or replication configured.

Regenerating a XDQP Host Certificates

The first step in the process is to check the Certificate to see whether it is valid or not.  If you replaced your node using method 1, the certificate is likely to be valid.  If you replaced your node using method 2, then the certificate is likely to be invalid.

Log into a terminal on the newly replaced host, and extract the private key from /var/opt/MarkLogic/server.xml and the hosts certificate from /var/opt/MarkLogic/hosts.xml:

  • $ cp /var/opt/MarkLogic/server.xml /tmp/server.key
  • Edit /tmp/server.key to remove all XML formatting
    • File should start with "-----BEGIN PRIVATE KEY-----"
    • File should end with "-----END PRIVATE KEY-----"

Now extract the certificate for the new host from/var/opt/MarkLogic/hosts.xml.

  • $ grep -A25 my-host.name /var/opt/MarkLogic/hosts.xml > /tmp/server.crt
  • Remove all the data from the file, except the certificate for the new host
    • File should start with "-----BEGIN CERTIFICATE-----"
    • File should end with "-----END CERTIFICATE-----"

Once you have the private key, and the certificate, you can compare the md5 signatures of the files usingopenssl, to see if they match.

  • $ openssl rsa -in /tmp/server.key -noout -modulus | openssl md5; openssl x509 -in /tmp/server.crt -noout -modulus | openssl md5

If the values match, STOP HERE.  The certificate is valid and does not need to be regenerated. If the values do not match, then the certificate needs to be regenerated.

Make note of the <host-id> from /var/opt/MarkLogic/server.xml.  This will be used to populate the value for the Common Name (CN) when the certificate is generated.

  • $ grep -B1 hostname /var/opt/MarkLogic/hosts.xml

Create the new self-signed certificate using the servers private key.  Typically these are set to 10 years (3650 days) by default when MarkLogic first runs, but you can choose another value if needed.  Use the <host-id> from the previos step as the CN.

  • $ sudo openssl req -key /tmp/server.key -new -x509 -days 3650 -out /tmp/new-server.crt -subj "/CN=[server-id-number]"

Compare the MD5 Checksums with openssl, this time they should match:

  • $ openssl rsa -in /tmp/server.key -noout -modulus | openssl md5; openssl x509 -in /tmp/new-server.crt -noout -modulus | openssl md5

Make a copy of hosts.xml to replace the certs, also note the host-id for use in a later step.

  • $ cp -p /var/opt/MarkLogic/hosts.xml /tmp/hosts.xml

Edit /tmp/hosts.xml and replace the old certificate for the host with the new certificate.  Find the entry with the correct <host-id> and replace the <ssl-certificate> field with the new certificate in /tmp/new-server.crt

Replace the existing hosts.xml with our updated copy

  • $ cp -p /tmp/hosts.xml /var/opt/MarkLogic/hosts.xml

Restart MarkLogic on the node.  This can be done from any host in the cluster, using the Admin Interface, the REST Management API endpoint, or Query Console.

  • Admin Interface: In the left tree menu, click onConfigure à Hosts à [Hostname], then select theStatus tab and click Restart
  • REST Management API: $ curl --anyauth --user password:password -X POST -i --data "state=restart" -H "Content-type: application/x-www-form-urlencoded" http://localhost:8002/manage/v2/hosts/[host-name]
  • Query Console: xdmp:restart((xdmp:host("engrlab-129-179.engrlab.marklogic.com")), "To reload hosts.xml after certificate update")

Verify the changes to hosts.xml have propagated to all hosts in the cluster.  Check that the hosts.xml is now the same for the hosts in the cluster.  One way of doing this is comparing md5 checksums.

  • $ md5sum /var/opt/MarkLogic/hosts.xml

You should now be able to set xdqp ssl enabled to true in the group configurations.  Check the cluster status page in the Administrative Interface to ensure all the hosts have reconnected successfully, or review the ErrorLog files to ensure there are no SVC-SOCACC errors in the log.

Additional Notes

This article explains how to directly replace a node in a cluster by using the same host name. Another way is to add a new node to the cluster and transfer the forests which is explained in the following knowledge base article "Replacing a D-Node with local disk failover".

Some of these steps may differ, such as operating system calls or file system locations. On a different OS, the specific commands will need to be adjusted to match the environment.

Related Reading

Replacing a failed MarkLogic node in a cluster: a step by step walkthrough

(6 vote(s))
Helpful
Not helpful

Comments (0)