Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
How to handle XDQP-TIMEOUT on a busy cluster
27 June 2013 03:00 PM

Introduction

Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index and format detection.

Forest Remounts

Every time a forest remounts, the error log will show a lot messages like these:

2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas
2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers
2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln
2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln
2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln
2012-08-27 06:50:35.370 Debug: Detecting compatibility for database Last-Login
2012-08-27 06:50:35.370 Debug: Detecting compatibility for database Triggers
2012-08-27 06:50:35.370 Debug: Detecting compatibility for database Schemas
2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln
2012-08-27 06:50:35.370 Debug: Detecting compatibility for database Modules
2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln
2012-08-27 06:50:35.373 Debug: Detecting compatibility for database Security
2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln
2012-08-27 06:50:35.485 Debug: Detecting compatibility for database my-modules
2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln
2012-08-27 06:50:35.773 Debug: Detecting compatibility for database App-Services
2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln
2012-08-27 06:50:35.773 Debug: Detecting compatibility for database Fab
2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp
2012-08-27 06:50:35.805 Debug: Detecting compatibility for database Documents

... and so on ...

This can go on for several minutes and will cost you more down time than necessary, since you already know the format and indexes for each database.

Improving the situation

Here are some suggestions for improving this situation:

  1. Browse to Admin UI -> Databases -> my-database-name
  2. Set ‘format compatibility’ to ‘5.0’
  3. Set ‘index detection’ to ‘none’
  4. Set ‘expunge locks’ to ‘none’

Repeat steps 1-4 for all active databases.

Now tweak the group settings to make the cluster less sensitive to an occasional busy host:

  1. Browse to Admin UI -> Groups -> E-Nodes
  2. Set ‘xdqp timeout’ to 30
  3. Set ‘host timeout’ to 90
  4. Click OK to make this change effective.

The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed.

If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results.

(2 vote(s))
Helpful
Not helpful

Comments (0)