Best Practice for Adding an Index in Production
08 February 2021 02:05 PM
|
|
Best Practice for Adding an Index in ProductionSummaryIt is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be a time and resource intensive process, that can affect query performance while the server is reindexing. This article points out some strategies for avoiding some of the pain-points associated with changing your database configuration on a production cluster. Preparing your Server for ProductionIn general, high performance production search implementations run with tight controls on the automatic features of MarkLogic Server.
The xdqp and host timeouts will prevent the server from disconnecting prematurely when a data-node is busy, possibly triggering a false failover event. However, these changes will affect the legitimate time to failover in an HA configuration. Preparing to Re-indexWhen an index configuration must be changed in production, you should:
When you have Database Replication Configured:If you have to add or modify indexes on a database which has database replication configured, make sure the same changes are made on the Replica cluster as well. Starting with ML server version 9.0-7, index data is also replicated from the Master to the Replica, but it does not automatically check if both sides have the same index settings. Reindexing is disabled by default on a replica cluster. However, when database replication configuration is removed (such as after a disaster), the replica database will reindex as necessary. So it is important that the Replica database index configuration matches the Master’s to avoid unnecessary reindexing. Note: If you are on a version prior to 9.0-7 - When adding/updating index settings, it is recommended that you update the settings on the Replica database before updating those on the Master database; this is because changes to the index settings on the Replica database only affect newly replicated documents and will not trigger reindexing on existing documents. Further reading - Master and Replica Database Index Settings Database Replication - Indexing on Replica Explained
Reindexing works by reloading all the Uris that are affected by the index change, this process tends to create lots of new/deleted fragments which then need to be merged. Given that reindexing is very CPU and disk I/O intensive, the re-indexer-throttle can be set to 3 or 2 to minimize impact of the reindex. After the Re-indexAfter the re-index has completed, it is important to return to the old settings by disabling the reindexer and setting index-detection back to none. If you're reindexing over several nights or weekends, be sure to allow some time for the merging to complete. So for example, if your regular busy time starts at 5AM, you may want to disable the reindexer at around midnight to make sure all your merging is completed before business hours. By following the above recommendations, you should be able to complete a large re-index without any disruption to your production environment. | |
|