Techniques for dividing tasks between hosts in a cluster
22 December 2020 08:41 AM
This article will outline a general strategy for distributing a specific task across every node in a server.
There are situations where you would like to execute queries against a number of hosts in a cluster - one such example would be to break a query down so it only operates on the forests on that particular node. Using the patterns described in this article, you will be able to build a mechanism to do just that.
Wouldn't it be useful if you could pass in options into xdmp:spawn() to allow the execution of code on a specific host in a cluster?
While this has been filed as an RFE (2763) for consideration in a future release of the product, there are a few options open to you.
From the top down
1. Gather information about each host in your cluster
For this you can use a call to xdmp:hosts(). This will give you a sequence of host ids - each corresponding with a node in your cluster. From here, you can get the xdmp:host-name() The snippet below demonstrates this:
2. Create a call to an http endpoint on each host in a cluster
We can build on the steps outlined in the first part to generate a list of URIs - each mapping to an endpoint (which would be serviced by a corresponding XQuery module to perform a particular task on that host). In the example below, we're using fn:concat() to generate the links for each host and then issuing a call to xdmp:document-get() to hit the same application server endpoint on each host.
3. Isolate forests for a given host
While the above technique might be useful for some purposes, you could allow for further precision by building a query which could operate exclusively on the forests managed by that node; using the technique above, this variation would allow you to "pre-screen" a databases forests to only operate against forests on that host:
This KB article has introduced some fairly simple patterns to allow you to programmatically direct requests to a particular host in a cluster. It also demonstrates a technique for preparing queries to operate at individual forest level.
Such techniques can be useful for performing administrative tasks on an individual host, auditing the contents of an individual forest (or group of forests) and allow for even more flexibility when you consider bulk processing tools such as CoRB and XQSync - both of which allow you to select documents based on a custom query (which could be restricted by passing in a sequence of one or more forest ids).
Additionally, as you have the ability to target a specific host in executing a task, you could also use the above techniques to write out a specific properties file to a writable partition on your system (such as /tmp) using a call to xdmp:save().