Resolving XDMP-EXPNTREECACHEFULL errors
16 February 2023 11:28 AM
Occasionally, while running a query, you may see the following message returned:
The expanded tree cache is the memory pool used for MarkLogic's proprietary representation of XML fragments while they are being used during query processing. XML documents are stored in disk as fragments, in a highly compressed form. As XML fragments are needed during query evaluation, they are retrieved from disk in compressed format and cached in the compressed tree cache. When the query needs to actually retrieve elements, values, or otherwise traverse the contents of one of these fragments, the fragment is uncompressed and cached in the expanded tree cache.
Consequently, the expanded tree cache needs to be large enough to maintain a copy of every expanded XML fragment that is simultaneously needed during query processing. (Note that this does not necessarily imply that every fragment used by a given query is needed simultaneously; a lot depends on what a query does and how it is written.) Expanded fragments may be 3-5x the byte count of the original XML
The error message
There are four approaches to solving this problem:
1. Change the problem query so that it does not need to use as much XML data.
2. Tune the problem query so that it does not need to simultaneously cache as much XML.
3. Increase the size of the expanded tree cache, using the setting under Groups > Default > Configure
4. Ensure that your content is properly fragmented, if appropriate.
Change the problem query
Approach (1) generally means a change in requirements (for instance, returning only 100 results instead of 500 results).
Tune the problem query
Approach (2) requires a very effective knowledge of performance tuning XQuery. For instance, in some cases it's possible for a particular query to process 20GB of XML with only 128 MB of expanded tree cache IF the query is written properly. An initial implementation of that query, however, could easily require a 20 GB expanded tree cache. Typically, our professional services staff is involved in these exercises, but if you want to send us the problem query, we're happy to take a quick look and see if we can give you general advice.
Increase size of expanded tree cache (Restart Required)
Approach (3) will work so long as you have sufficient available memory and the memory required is not over-large, but it may be a band-aid to a problem that should really be fixed through approach (1), such as the 20 GB example outlined.
Alternatively, if there is not sufficient memory to increase total cache size, you can increase the size of the cache partitions by decreasing the number of partitions. More partitions allow more concurrency, but make each individual cache partition smaller, which could make it more likely for the cache to fill up. There is a maximum size for the expanded tree cache of 32768 MB (73728 MB as of v7.0-5 and v8.0-2), and each partition should be no more than 8192 MB.
The server will determine the cache settings it uses at startup and log them at Debug level. For example:
This can be used to confirm the size and number of cache partitions in use.
Note that both of the above solutions do require a cluster restart. As a last resort, if a cluster restart is not desired, you may use Query Console to make a call as follows:
Note that this call is undocumented, but it will allow you to clear the cache on a host without a server restart. In a cluster, you will have to call it for each host. Also, please note that it will kill the cache for each host that you invoke it on, which will result in a temporary hit on performance until the cache warms up again.
Ensure content is properly fragmented
Approach (4) reflects the fact that large XML documents can take up a lot of memory during query evaluation, and MarkLogic's fragmentation capabilities are designed with this in mind. Fragmentation allows Mark Logic to load only the needed parts of large documents during query evaluation, thereby reducing memory requirements. Fragmentation does have other ramifications for query evaluation, however, as described in the Developer's Guide. If your expanded tree cache problem occurs while working with large documents, fragmentation may be an appropriate solution. If your problem occurs while working with small documents, fragmentation will not help.