Performance implications of Large Number of Range Indexes.
26 February 2020 06:59 PM
MarkLogic does not enforce a programmatic upper limit on How many indexes you *can* have. This leaves open the question of how many range indexes should be used in your application. The answer is that you should have as many as the application requires, but with the caveat that there are some infrastructure limits that should be taken into account. For instance:
1. More Memory Mapped file Handles (file fd)
OS has limits of how many file handles a given process can have at a given point in time. This limit, therefore, affects how many range index files, and therefore range indexes a given MarkLogic process can have; However, One could configure higher File Handle limits on most platforms (ulimit, vm.max_map_count).
2. More RAM requirement
In-memory footprint of node involves In-memory structures like in-memory-list-cache, in-memory-tree-cache, in-memory-range index, in-memory-reverse-index (if-reverse-query-enabled) , in-memory-triple-index (if-triple-positions-enabled); multiply those with total number of forests + buffer.
A Large number of Range indexes can result in a huge index expansion in memory use. Also, values mentioned above are in addition to memory that would be required for MarkLogic Server to maintain its HTTP servers, perform merges, reindex, re-balance, as well as operations like processing queries, etc.
3. Longer Merge Times (Bigger stands due to Large index expansion)
Large number of Range Index ends up expanding data in forests. Now for a given host size and number of hosts- larger stand sizes in forest will make range index query faster; However it will also make merge times slower. If we want to make Queries and merges all fast with a large number of range indexes, we will need to scale out the number of physical hosts.
4. More CPU, Disk & IO requirement
Merges are IO intensive processes; this, combined with frequent updates/load could result in CPU as well as IO bottlenecks.
5. Longer Forest Mount times
In general, Each configured range index with data takes two memory mapped files per stand.
A typical busy host has on the order of 10 forests, each forest with on the order of 10 stands; So a typical busy host has on the order of 100 stands.
Now for 100 stands -
As we increase our range indexes, at some point of time, Server will take unreasonably long time to start up (unless we throw equivalent processing power).
The amount of time one is willing to wait for the server to start up is not a hard limit, but the question should be "what is 'reasonable' behavior for Server start-up in eyes of Server Admin based on current hardware."
Range Indexes in magnitude of a thousand starts affecting Performance if not managed properly and if above consideration are not accounted for; In most scenarios the solution to the problem is not about "How many indexes can we configure", but rather about "How many indexes do we need".
MarkLogic considers configured range index in the order of 100 as a “reasonable” limit, because it results in “reasonable” behaviors of the Server.
Tips for Best Performance for Solutions with lots of Range Indexes
Before launching your application, review the number of Range Indexes and work to 1) Remove ones that are not being used, and 2) Consolidate any range indexes that are mutually redundant. This will help you get under the prescribed 100 range index limit.
On systems that already have a large number of range indexes (say 100+), merging multiple stands may become a performance issue. Thus, you will need to think about easing the query and merge load, here are some strategies for easing the load on your system: