Knowledgebase:
Performance implications of Large Number of Range Indexes.
26 February 2020 06:59 PM

Summary

MarkLogic does not enforce a programmatic upper limit on How many indexes you *can* have. This leaves open the question of how many range indexes should be used in your application. The answer is that you should have as many as the application requires, but with the caveat that there are some infrastructure limits that should be taken into account. For instance:

1. More Memory Mapped file Handles (file fd)

OS has limits of how many file handles a given process can have at a given point in time. This limit, therefore, affects how many range index files, and therefore range indexes a given MarkLogic process can have; However, One could configure higher File Handle limits on most platforms (ulimit, vm.max_map_count).

2. More RAM requirement 

In-memory footprint of node involves In-memory structures like in-memory-list-cache, in-memory-tree-cache, in-memory-range index, in-memory-reverse-index (if-reverse-query-enabled) , in-memory-triple-index (if-triple-positions-enabled); multiply those with total number of forests + buffer.

A Large number of Range indexes can result in a huge index expansion in memory use. Also, values mentioned above are in addition to memory that would be required for MarkLogic Server to maintain its HTTP servers, perform merges, reindex, re-balance, as well as operations like processing queries, etc.

Tip: Memory consumption can be reduced by configuring a database to optimize range indexes for minimum memory usage (memory-size); Default is configured for maximum performance (facet-time). 

UI : Admin UI > Databases > {database-name} > Configure > range index optimize [facet-time or memory-size]

API : admin:database-set-range-index-optimize 

3. Longer Merge Times (Bigger stands due to Large index expansion)

Large number of Range Index ends up expanding data in forests. Now for a given host size and number of hosts- larger stand sizes in forest will make range index query faster; However it will also make merge times slower. If we want to make Queries and merges all fast with a large number of range indexes, we will need to scale out the number of physical hosts. 

4. More CPU, Disk & IO requirement 

Merges are IO intensive processes; this, combined with frequent updates/load could result in CPU as well as IO bottlenecks.

5. Longer Forest Mount times

In general, Each configured range index with data takes two memory mapped files per stand.

A typical busy host has on the order of 10 forests, each forest with on the order of 10 stands; So a typical busy host has on the order of 100 stands.

Now for 100 stands -

  • With 100 range indexes, we have in the order of 10,000 files to open and map when the server starts up.
  • While for 1,000 range indexes, we have in the order of 100,000 files to open and map when the server starts up.
  • While for 10,000 range indexes, we have in the order of 1,000,000 mapped files to open and map when the server starts up.

As we increase our range indexes, at some point of time, Server will take unreasonably long time to start up (unless we throw equivalent processing power).

The amount of time one is willing to wait for the server to start up is not a hard limit, but the question should be "what is 'reasonable' behavior for Server start-up in eyes of Server Admin based on current hardware."

Conclusion

Range Indexes in magnitude of a thousand starts affecting Performance if not managed properly and if above consideration are not accounted for; In most scenarios the solution to the problem is not about "How many indexes can we configure", but rather about "How many indexes do we need".

MarkLogic considers configured range index in the order of 100 as a “reasonable” limit, because it results in “reasonable” behaviors of the Server.

Tips for Best Performance for Solutions with lots of Range Indexes

Before launching your application, review the number of Range Indexes and work to 1) Remove ones that are not being used, and 2) Consolidate any range indexes that are mutually redundant. This will help you get under the prescribed 100 range index limit.

On systems that already have a large number of range indexes (say 100+), merging multiple stands may become a performance issue. Thus, you will need to think about easing the query and merge load, here are some strategies for easing the load on your system: 

  1. Increase merge-max-size from 32768 to 49152 on your database. This will create larger stands and will lower the number of merges that need to be performed.
  2. There is configuration setting "preload mapped data" (default false), by leaving it as false, it will speed up merging of forest stands. Bear in mind that this will come at the cost of slower query performance immediately after forest mounts.
  3. If your system begins to slow down due to merging activity, you can spread the load by adding more hosts & forests to your cluster. The smaller forests and stands will merge and load faster when there are more CPU cores and IO bandwidth to service them.

Further Reading

(1 vote(s))
Helpful
Not helpful

Comments (0)