How to handle XDQP-TIMEOUT on a busy cluster | MarkLogic Support

Knowledgebase:

How to handle XDQP-TIMEOUT on a busy cluster 24 February 2020 04:28 PM
Introduction Sometimes, when a cluster is under heavy load, your cluster may show a lot of XDQP-TIMEOUT messages in the error log. Often, a subset of hosts in the cluster may become so busy that the forests they host get unmounted and remounted repeatedly. Depending on your database and group settings, the act of remounting a forest may be very time-consuming, due to the fact that that all hosts in the cluster are being forced to do extra work of index detection. Forest Remounts Every time a forest remounts, the error log will show a lot messages like these: 2012-08-27 06:50:33.146 Debug: Detecting indexes for database my-schemas 2012-08-27 06:50:33.146 Debug: Detecting indexes for database Triggers 2012-08-27 06:50:35.370 Debug: Detected indexes for database Last-Login: sln 2012-08-27 06:50:35.370 Debug: Detected indexes for database Triggers: sln 2012-08-27 06:50:35.370 Debug: Detected indexes for database Schemas: sln 2012-08-27 06:50:35.370 Debug: Detected indexes for database Modules: sln 2012-08-27 06:50:35.373 Debug: Detected indexes for database Security: sln 2012-08-27 06:50:35.485 Debug: Detected indexes for database my-modules: sln 2012-08-27 06:50:35.773 Debug: Detected indexes for database App-Services: sln 2012-08-27 06:50:35.773 Debug: Detected indexes for database Fab: sln 2012-08-27 06:50:35.805 Debug: Detected indexes for database Documents: ss, fp ... and so on ... This can go on for several minutes and will cost you more down time than necessary, since you already know the indexes for each database. Improving the situation Here are some suggestions for improving this situation: Browse to Admin UI -> Databases -> my-database-name Set ‘index detection’ to ‘none’ Set ‘expunge locks’ to ‘none’ Repeat steps 1-4 for all active databases. Now tweak the group settings to make the cluster less sensitive to an occasional busy host: Browse to Admin UI -> Groups -> E-Nodes Set ‘xdqp timeout’ to 30 Set ‘host timeout’ to 90 Click OK to make this change effective. The database-level changes tell the server to speed up cluster startup time when a server node is perceived to be offline. The group changes will cause the hosts on that group to be a little more forgiving before declaring a host to be offline, thus preventing forest unmounting when it's not really needed. If after performing these changes, you find that you are still experiencing XDQP-TIMEOUT's, the next step is to contact MarkLogic Support for assistance. You should also alert your Development team, in case there is a stray query that is causing the data nodes to gather too many results. Related Reading XML Data Query Protocol (XDQP)
(8 vote(s)) Helpful Not helpful

Comments (0)