Solutions

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Learn

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay on top of everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase : Performance Tuning
SUMMARY When changing the amount of RAM on your MarkLogic Server host, there are additional considerations such as cache settings and swap space. GROUP CACHE SETTINGS As a _‘RULE OF THUMB’_, the memory allocated to group caches (List, Compressed Tree...
BEST PRACTICE FOR ADDING AN INDEX IN PRODUCTION SUMMARY It is sometimes necessary to remove or add an index to your production cluster. For a large database with more than a few GB of content, the resulting workload from reindexing your database can be ...
INTRODUCTION MarkLogic Server is engineered to scale out horizontally by easily adding forests and nodes. Be aware, however, that when adding resources horizontally, you may also be introducing additional demand on the underlying resources. DETAILS On...
SUMMARY MarkLogic Server clusters are built on a distributed, shared nothing architecture. Typical query loads will maximize resource utilization only when database content is evenly distributed across D-node hosts in a cluster. That is, optimal perform...
SUMMARY A forest reindex timeout error may occur when there are transactions holding update locks on documents for an extended period of time. A reindexer process is started as a result of a database index change or a major MarkLogic Server upgrade. T...
INTRODUCTION Users of Java based batch processing applications, such as CoRB, XQSync, mlcp and the hadoop connector may have seen an error message containing "PREMATURE EOF, PARTIAL HEADER LINE READ". Depending on how exceptions are managed, this may cau...
SUMMARY Here we discuss various methods for sharing metering data with Support: telemetry in MarkLogic 9 and exporting monitoring data. DISCUSSION TELEMETRY In MARKLOGIC 9, enabling telemetry collects, encrypts, packages, and sends diagnostic and sys...
SUMMARY Executing searches as "unfiltered" is an important performance optimization, particularly for large result sets. This article describes how "filtered" and "unfiltered" searches work, and what tradeoffs each option entails. In general, unfilter...
SUMMARY This article discusses the MarkLogic group-level caches and Linux Huge Page configurations. GROUP CACHES MarkLogic utilizes caches to increase retrieval performance of frequently-accessed objects. In particular, MarkLogic caches: 1. Expand...
MARKLOGIC DEFAULT GROUP LEVEL CACHE AND HUGE PAGES SETTINGS The table below shows the default (and recommended) group level cache settings based on a few common RAM configurations for the 9.0-9.1 release of MarkLogic Server: TOTAL RAM LIST CACHE ...
INTRODUCTION This article is intended to give you enough information to enable you to understand the output from query console's profiler. DETAILS QUERY Consider the following XQuery: xquery version '1.0-ml'; let $path := '/Users/chamlin/Downloads/...
BACKGROUND A database consists of one or more forests. A forest is a collection of documents (mostly XML trees, thus the name), implemented as a physical directory on disk. Each forest holds a set of documents and all their indexes. When a new d...
SUMMARY Hung messages in the ErrorLog indicate that MarkLogic Server was blocked while waiting on host resources, typically I/O or CPU. DEBUG LEVEL The presence of Debug-level Hung messages in the ErrorLog does not indiciate a critical problem, but ...
INDEXING BEST PRACTICES Indexing in MarkLogic occurs when a document is added or updated. When adding a new index, the server runs an estimate of all the fragments that match and the proceeds to reload those URIs that match. Indexing/reindexing can be ...
SUMMARY This article contains a high level overview of _TRANSPARENT HUGE PAGES_ and _HUGE PAGES_. It covers the configuration of _Huge Pages_ and offers advice as to when _Huge Pages_ should be used and how they can be configured. To the Linux kernel, ...
SUMMARY MarkLogic performs best if swap space is not used. There are other knowledge base articles that discuss sizing recommendations when configuring your MarkLogic Server. This article discusses the Linux swappiness setting that can limit the amount o...
INTRODUCTION: THE DECIMAL TYPE In order to be compliant with the XQuery specification and to satisfy the needs of customers working with financial data, MarkLogic Server implements a decimal type, available in XQuery and server-side JavaScript. Decimal...
SUMMARY This article will help MarkLogic Adminiistrators and System Architects who need to understand how to provision the I/O capacity of their MarkLogic installation. MARKLOGIC DISK USAGE Databases in MarkLogic Server are made up of forests. Indivi...
With the release of MarkLogic Server versions 8.0-8 and 9.0-4, detailing memory use broken out by major areas is periodically recorded to the error log. These diagnostic messages can be useful for quickly identifying memory resource consumption at a glanc...
MONITORING HISTORY The Monitoring History feature allows you to capture and view critical performance data from your cluster. Monitoring History capture is enabled at the group level. Once the performance data has been collected, you can view the data i...
SUMMARY When restarting very large forests, some customers have noted that it may take a while for them to mount. While the forests are mounting, the database is unable to come online, thus impacting the availability of your main site. This article show...
WHICH I/O SCHEDULERS ARE RECOMMENDED FOR MARKLOGIC? Three I/O schedulers are recommended for use with MarkLogic Server: * deadline configured by setting elevator=deadline as a kernel boot parameter * noop configured by setting elevator=noop as a ker...
INTRODUCTION MarkLogic Server's 'DatabaseClient' instance represents a database connection sharable across threads. The connection is stateless, except that authentication is done the first time a client interacts with the database via a Document Manager...
SUMMARY This article briefly looks at the performance implications of ad hoc queries versus passing external variables to a query in a module DETAILS Programatically, you can achieve similar results by dynamically generating ad hoc queries on the ...
SUMMARY MarkLogic does not enforce a programmatic upper limit on How many indexes you *can* have. This leaves open the question of how many range indexes should be used in your application. The answer is that you should have as many as the application re...
PERFORMANCE IMPLICATIONS OF UPDATING MODULE AND SCHEMA DATABASES This article briefly looks at the performance implications of adding or modifying modules or schemas to live (production) databases. DETAILS When XQuery modules or schemas are referenced...
OVERVIEW Performance issues in MarkLogic Server typically involve either 1) UNNECESSARY WAITING ON LOCKS or 2) OVERLARGE WORKLOADS. The goal of this knowledgebase article is to give a high level overview of both of these classes of performance issue, as ...
SUMMARY This article lists some common system and MarkLogic Server settings that can affect the performance of a MarkLogic cluster. DETAILS From MarkLogic System Requirements [http://developer.marklogic.com/products/marklogic-server/requirements-7.0]:...
_This article is a snapshot of the talk that Jason Hunter and Franklin Salonga gave next at MARKLOGIC WORLD 2014, also titled, "Performance Theory: Tales From The MarkLogic Support Desk." Jason Hunter is Chief Architect and Frank Salonga is Lead Engineer ...
SUMMARY There are index settings that may be problematic if your documents contain encoded binary data (such as Base64 encoded binary). This article identifies a couple of these index settings and explains the potential pitafall. DETAILS When WORD...
OVERVIEW UPDATE TRANSACTIONS run with _readers/writers locks_, obtaining locks as needed for documents accessed in the transaction. Because update transactions only obtain locks as needed, update statements always see the latest version of a document. Th...
SUMMARY When used as a file system, GFS needs to be tuned for optimal performance with MarkLogic Server. RECOMMENDATIONS Specifically, we recommend tuning the demote_secs and statfs_fast parameters. The demote_secs parameter determines the amount of t...
INTRODUCTION MarkLogic is supported on XFS filesystem. The minimum system requirements can be found here: https://developer.marklogic.com/products/marklogic-server/requirements-9.0 [https://developer.marklogic.com/products/marklogic-server/requirements...
WHAT IS DLS? The Document Library Service (DLS) enables you to create and maintain versions of managed documents in MarkLogic Server. Access to managed documents is controlled using a check-out/check-in model. You must first check out a managed docu...
SUMMARY Some MarkLogic Server sites are intalled in a 1GB network environment. At some point, your cluster growth may require an upgrade to 10GB ethernet. Here are some hints for knowing when to migrate up to 10GB ethernet, as well as some ways to work a...
INTRODUCTION The performance and resource consumption of E-nodes is determined by the kind of queries executed in addtion to the distribution and amount of data. For example, if there are 4 forests in the cluster and the query is asking for only the top-...
SUMMARY Sometimes, following a manual merge, a number of deleted fragments -- usually small number -- are left behind after the merge completes. In a system that is undergoing steady updates, one will observe that the number of deleted fragments will go ...
INTRODUCTION Slow journal frame log entries will be logged at WARNING level in your ErrorLog file and will mention something like this: > .....journal frame took 28158 ms to journal... EXAMPLES 2016-11-17 18:38:28.476 Warning: forest Documents JOURNA...
SUMMARY The XDMP-INMMTREEFULL, XDMP-INMMLISTFULL, XDMP-INMMINDXFULL, XDMP-INMREVIDXFULL, XDMP-INMMTRPLFULL & XDMP-INMMGEOREGIONIDXFULL [http://docs.marklogic.com/guide/admin/databases#id_59939] messages are INFORMATIONAL ONLY.  These messages indicate t...
INTRODUCTION There are two ways of leveraging SSDs that can be used independently or simultaneously. FAST DATA DIRECTORY In the forest configuration for each forest, you can configure a Fast Data Directory. The Fast Data Directory is designed for fast...
INTRODUCTION                                                                                              MarkLogic Server is a highly scalable, high performance Enterprise NoSQL database platform. Configuring a MarkLogic cluster to run as virtual machi...
INTRODUCTION MarkLogic Server uses its list cache to hold search term lists in memory. If you're attempting to execute a particularly non-selective or inefficient query, your query will fail due to the size of the search term lists exceeding the allocate...
MarkLogic Server is designed to scale horizontally, and goes to great effort to make sure queries can be parallelized independently of one another. Nevertheless, there are occasions where users will run into an issue where, when invoked in parallel, so...
SUMMARY XDMP-CANCELED indicates that a query or operation was cancelled either explicitly or as a result of a system event. XDMP-EXTIME also indicates that a query or operation was cancelled, but the reason for the cancellation is the result of the ...
Introduction The first time a query is executed, it will take longer to execute than subsequent runs. This extra time for the first runs become more pronounced when importing large libraries. Why is this so and is there anything that we can do to improv...