Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
Database Replication: Same Fragment count but Different Database sizes
21 September 2017 10:09 AM

Introduction

Database Replication replicates fragments/documents from a source database to a target database. You may see different database sizes (even when active fragment counts are then same) between Master and Replica Databases. This article provides overview of variables and reasons behind such observation.

Database Replication:

Database Replication operates at the forest level by copying journal frames from a forest in the Master database and replaying them on a corresponding forest in the foreign replica database. In other words, this means that when Journal frames are replayed in the replica database, the same group of documents in a single stand of the master database, does not necessarily reside in the same stand on the replica database - i.e. the distribution of fragments within stands are different between the master and replicas. 

Also, Note that Master and Replica forests can be distributed differently across hosts in each cluster. Even when they are distributed identically (Master DB forest name to Replica DB forest name) you could still see a different number stand between them.

Database Size, Deleted Fragment and Merge:

Current Database Size depends on number of factors like number of documents, index, deleted fragments in Stand etc. Deleted Fragments in any stand itself depends on Merge Policy, Background Merge process, Processing Cycle available, Linux Memory Config, Memory Usage at any given time, and application usage pattern.

Conclusion:

Master Cluster and Replica Cluster are separate entities. Although connected, they operate independently. Replica Database on target cluster provides data consistency. However how data can be spread across different stands than the corresponding master, including the retention of deleted fragments, will differ between Master and Replica Cluster. Hence you may see different sizes between Master and Replica Databases, even where the active fragments are the same.

Further Reading

(1 vote(s))
Helpful
Not helpful

Comments (0)