Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Techniques for dividing tasks between hosts in a cluster
09 May 2018 05:43 AM

Introduction

This article will outline a general strategy for distributing a specific task across every node in a server.

There are situations where you would like to execute queries against a number of hosts in a cluster - one such example would be to break a query down so it only operates on the forests on that particular node. Using the patterns described in this article, you will be able to build a mechanism to do just that.

The problem

Wouldn't it be useful if you could pass in options into xdmp:spawn() to allow the execution of code on a specific host in a cluster?

While this has been filed as an RFE (2763) for consideration in a future release of the product, there are a few options open to you.

From the top down

1. Gather information about each host in your cluster

For this you can use a call to xdmp:hosts(). This will give you a sequence of host ids - each corresponding with a node in your cluster. From here, you can get the xdmp:host-name() The snippet below demonstrates this:

2. Create a call to an http endpoint on each host in a cluster

We can build on the steps outlined in the first part to generate a list of URIs - each mapping to an endpoint (which would be serviced by a corresponding XQuery module to perform a particular task on that host). In the example below, we're using fn:concat() to generate the links for each host and then issuing a call to xdmp:document-get() to hit the same application server endpoint on each host.

3. Isolate forests for a given host

While the above technique might be useful for some purposes, you could allow for further precision by building a query which could operate exclusively on the forests managed by that node; using the technique above, this variation would allow you to "pre-screen" a databases forests to only operate against forests on that host:

Summary

This KB article has introduced some fairly simple patterns to allow you to programmatically direct requests to a particular host in a cluster. It also demonstrates a technique for preparing queries to operate at individual forest level.

Such techniques can be useful for performing administrative tasks on an individual host, auditing the contents of an individual forest (or group of forests) and allow for even more flexibility when you consider bulk processing tools such as CoRB and XQSync - both of which allow you to select documents based on a custom query (which could be restricted by passing in a sequence of one or more forest ids).

Additionally, as you have the ability to target a specific host in executing a task, you could also use the above techniques to write out a specific properties file to a writable partition on your system (such as /tmp) using a call to xdmp:save().

(1 vote(s))
Helpful
Not helpful

Comments (0)