Knowledgebase:
What happens if a reindex process is running and a scheduled backup begins?
22 November 2016 01:29 PM

Introduction

An issue that has been brought to our attention in the past is one where a nightly scheduled backup is set up by a System Administrator and left to diligently work away in the background. Elsewhere, another team (or System Administrator) performs a major product upgrade or makes changes to the application such that a large reindexing process needs to kick off.

It is generally recommended that processes such as reindexes are treated as maintenance tasks and scheduled to run outside peak hours; primarily due to the additional load they may place on both the CPU and the IO subsystem. It's equally common for backup processes to be scheduled to run outside peak hours - largely for the benefit of capturing a complete backup of all the updates that were made during that particular working day.

The purpose of this Knowledgebase article is to describe the outcome of a scenario where a reindex is taking place and - during this time - a scheduled backup takes place.

The time cost of reindexing

MarkLogic Server has been designed to allow the process of reindexing to be stopped without having any negative impact on any of your queries. That is to say: a large and complex reindex may take a significant amount of time and as such, may need to be spread over multiple maintenance windows.

In the event where a reindex process does not run through to completion, your existing queries (that ran fine before) will still continue to run because they will be able to take advantage of the current on-disk index data.

This feature allows for easier upgrading to take place between newer releases of the product and allows you to iteratively add new functionality to your MarkLogic Application without breaking existing features.

When a scheduled backup takes place

For any mission critical situation, it's your data that is key. Any backup process is considered more important (and higher priority) than a reindex due to this fact. So in the event where a scheduled backup runs, the workload of the reindexer will be placed on hold until the backup has run to completion.

After the backup has completed, the reindexer (if enabled) will automatically start up again from where it left off.

The problem with this approach

The main issue with this process is that reindexing is both a time bound activity and one which provides regular feedback to the user - both through the Admin Interface on port 8001 and also through ErrorLog messages.

Concern can easily be caused when a process such as reindexing takes place along with the expectation that it is due to complete within a specified amount of time, only to find that this has not taken place.

When the user investigates the ErrorLogs, they'll see evidence of the backup starting, but they will likely notice a period of time where no progress from the reindexer is reported.

This - understandably - can be a cause for concern.

Admin UI Improvements

From MarkLogic 8.0-3 we added further messaging to the Admin UI on port 8001 to help to address this situation; from this release and above, if a backup takes place, you will now see a message indicating that rebalancing (or reindexing) have been disabled during either a backup or a restore:

(0 vote(s))
Helpful
Not helpful

Comments (0)