NTP Host Configuration
27 November 2019 05:05 AM
Clock synchronization plays a critical part in the operation of a MarkLogic Cluster.
MarkLogic Server expects the system clocks to be synchronized across all the nodes in a cluster, as well as between Primary and Replica clusters. The acceptable level of clock skew (or drift) between hosts is less than 0.5 seconds, and values greater than 30 seconds will trigger XDMP-CLOCKSKEW errors, and could impact cluster availability.
Network Time Protocol (NTP) is the recommended solution for maintaining system clock synchronization. NTP services can be provided by public (internet) servers, private servers, network devices, peer servers and more.
NTP uses a daemon process (ntpd) that runs on the host. The ntpd periodically wakes up, and polls the configured NTP servers to get the current time, and then adjust the local system clock as necessary. Time can be adjusted two ways, by immediately changing to the correct time, or by slowly speeding up or slowing down the system clock as necessary until it has reached the correct time. The frequency that the ntpd wakes up, called the polling interval, can be adjusted based on the level of accuracy needed anywhere between 1 and 17 minutes. NTP uses a hierarchy of servers called a strata. Each strata synchronizes with the layer above it, and provides synchronization to the later below it.
Public NTP Reference Servers
There are many public NTP reference servers available for time synchronization. It's important to note that the most common public NTP reference server addresses are for a pool of servers, so hosts synchronizing against them may end up using different physical servers. Additionally, the level of polling recommended for cluster synchronization is usually higher, and excessive polling could result in the reference server throttling or blocking traffic from your systems.
Stand Alone Cluster
For a cluster that is not replicated or connected to another cluster in some way, the primary concern is that all the hosts in the cluster be in sync with each other, rather than being accurate to UTC.
Clusters that act as either Primary or Replicas need to be synchronized with each other for replication to work correctly. This usually means that the hosts in both clusters should reference the same NTP servers.
It is common to have multiple servers referenced in the NTP configuration file, /etc/ntpd.conf. NTP may not choose the server based on the order in the file. Because of this, hosts could synchronize with different reference servers, introducing differences in the system clocks between the hosts in the cluster. Most organizations may have devices that can act as NTP servers in their infrastructure already, as many network devices are capable of acting as NTP servers, as are Windows Primary Domain Controllers. These devices can use default polling intervals, which avoids excessive polling against public servers.
Once you have identified your NTP server, you can configure NTP on the cluster hosts. We suggest using a single reference server for all the cluster hosts, then add all the hosts in the cluster as peers of the current node. We also suggest adding an entry for the local host as it's own server, assigning it a low strata. Using peers, and the local host allows the cluster hosts to negotiate and choose one of them to act as the reference server, providing redundancy in case the reference server is unavailable.
The following is a sample ntpd.conf file:
The burst option sends a burst of 8 packets when polling to increase the average of time offset statistics. Using it against a public NTP server is considered abuse.
The iburst sends a burst of 8 packets at initial synchronization which is designed to speed up the initial synchronization. Using it against a public NTP server is considered aggressive.
The minpoll and maxpoll settings are measured in seconds to the power of two, so a setting of 4 is 16 seconds.
The fudge setting is used to alter the stratum of the server from the default of 0.
As always, system configuration changes should always be tested and validated prior to putting them into production use.