Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Removing a Fast Data Directory or Large Data Directory From a Forest
27 November 2019 05:11 AM

Introduction

MarkLogic Server offers Fast Data Directories, and Large Data Directories to allow customers to better utilize their available infrastructure. This allows an organization to offload large objects to cheaper storage, or improve performance with SSDs for portions of a forest.  These directories are defined at the forest level, usually when the forest was created.

Removing Fast or Large Data Directories

There are two primary methods to remove these directories from a forest.

  • Rebalance to a new forest
  • Backup/Restore to a new forest

Rebalancing to a New Forest

This method takes advantage of the rebalancing mechanism in the server to move data from the forest with the Fast/Large Data Directories. New forests can be defined as part of this process, but it is not required.  The advantage of this method is that it does not require any downtime.  The primary disadvantage is that in can increase the IO, and CPU load on the servers as the data is moved between forests, and can result in data being moved more than once. If needed, these issues can be mitigate by adjusting the rebalancer priority and merge settings.

Backup/Restore to a new forest

This method allows a simple 1 for 1 swap of a forest with a Fast/Large Data Directory to one without these directories.  The advantage of this method is that, depending on the size of the forest, it can be completed faster than rebalancing.  There are a couple of disadvantages to this method.  The first is that the forest being replaced needs to be in read only mode when the backup is taken, until the restore is complete to the new forest.  The second is that it does require some downtime when switching between the old and new forests.  These issues can be mitigated with some careful planning.

Procedures for Using Rebalance

  • Create the new forest/s
  • Attach the new forest/s to the database AND retire the existing forest/s
    • This will cause the database to rebalance, and move the data from the old forest/s to the new forest/s.
  • Detach the old forest/s from the database once the forest/s no longer have active documents or active fragments.
  • Delete the old forest/s

Procedures for Using Backup/Restore

  • Put the forest/s in read only mode and perform a forest level backup
    • Database level backups can be used, but the whole database will need to be in read only mode when the backups are started.
  • Create a new forest.  Do not attach it to the database yet.
  • Restore the backup to the new forest/s
  • Verify the old forest/s and new forest/s have the same active document and active fragment count.
  • Detach the old forest/s and attach the new forest/s
  • Delete the old forest/s

References

(0 vote(s))
Helpful
Not helpful

Comments (0)