This article will help MarkLogic Administrators to monitor the health of their MarkLogic cluster. By studying the attached scripts, you will learn how to find out which hosts are down and which forests have failed over, enabling you to take the necessary recovery actions.
On a separate Linux host (not a member of the cluster), download the file attachments from this article, making sure that they all reside within the same directory.
Here is a general description of each file:
cluster-name.conf - Example configuration file used by script. Configures information for monitoring one ML cluster.
ml-ck-for-life.sh - A very simple, low-load check that all the nodes of a cluster are up and running.
ml-ck-for-health.sh - A more detailed check for essential cluster functionality with alerting (paging and/or emails to DBAs) if warranted. This script relies on at least one external XQuery file (mon-report-failed-over-forests.xqy) and makes use of the REST MGMT API as well as REST XQuery requests.
mon-report-failed-over-forests.xqy - External XQuery file used by ml-ck-for-health.sh
Preparing the CONF File for Use on Your Cluster
Before running the scripts, the
cluster-name.conf needs to be customized for your specific cluster. Start by changing the file name to match the name of your cluster, e.g.,
$ mv cluster-name.conf some-other-name.conf
some-other-name" is the actual name of the cluster, or of the application that is hosted on that cluster.
Next, you will need to customize some of the internal variables inside the CONF file itself. Here is the contents of the
cluster-name.conf file, as downloaded:
CLUSTER_NODES=( node1.my-company.com node2.my-company.com node3.my-company.com )
# MarkLogic Credentials for the REST Management port - 8002
# MarkLogic Credentials for the XQuery eval port - 8000
--------- end of listing ---------
CLUSTER_NAME, provide the cluster-name listed in the cluster's
CLUSTER_NODES, write in the host-names for each node in your cluster.
USER_PW_MGMT, provide the user-name and password for the REST MANAGEMENT user, the format is name:password.
USER_PW_XQ, provide the user-name and password for the user who will execute the XQuery scripts, the format is name:password.
UNIX_USER is a local Unix username with the correct rwx access rights for this directory.
PAGE_ADDRESSES & MAIL_ADDRESSES are alert email addresses who will be notified whenever there is a failover event.
ml-ck-for-health.sh was created with the idea it would be run repeatedly at a certain interval to keep tabs on system health. For example, it can be configured to be invoked with a cron job. A frequency of 5 to 120 minutes is a good candidate range. Ten minutes is a good time if you would like to be woken up (on average) within 5 minutes of a failover event.
Setting up SSH Passwordless Login
In monitoring script
ml-ck-for-health.sh, section (6) FOREST STATUS CHANGE, requires ssh access to the cluster hosts. That is because this section greps through MarkLogic server ErrorLogs. To enable this part of the script to run without prompting the user, "
ssh passwordless login" should be setup between the monitoring host and all the cluster hosts.There are many examples of how to do this on the internet, for example: http://www.tecmint.com/ssh-passwordless-login-using-ssh-keygen-in-5-easy-steps/ Alternatively, this monitoring section can be commented out.
Also regarding section (6), the “
grep” command is setup up to grep the latest 10 minutes from the ErrorLog. If this script is configured to be run less often then every 10 minutes, the “
grep” command line should be adapted to cover the desired period between script runs.
You are now ready to execute the failover monitoring scripts! Here is how you would execute them:
./ml-ck-for-health.sh some-other-name.conf MY-CLUSTER-NAME
$ ./ml-ck-for-life.sh some-other-name.conf
some-other-name" and MY-CLUSTER-NAME are your actual CONF and cluster-name, as described above]
Monitoring Multiple Clusters
So, given a monitoring machine with a directory of cluster configuration files in the style of
cluster-name.conf, those configuration files could be iterated through to monitor a suite of clusters from a single monitoring machine. It should be fairly easy to build a custom shell script to iterate through various cluster CONF files.
Final thought and Limitations
Please be aware that the
ml-ck-for-health.sh script is only partially implemented. In particular, the Replication Lag and Replication Failure sections are left as exercises for the user.
This script is being presented as a backup, lowest common denominator monitoring solution. For a more complete solution, you should explore other options, such as Splunk or Nagios.