Introduction
MarkLogic Server is a highly scalable, high performance Enterprise NoSQL database platform. Configuring a MarkLogic cluster to run as virtual machines follows tuning best practices associated with highly distributed, high performance database applications. Avoiding resource contention and oversubscription is critical for maintaining the performance of the cluster. The objective of this guide is to provide a set of recommendations for configuring virtual machines running MarkLogic for optimal performance. This guide is organized into sections for each computing resource, and provides a recommendation along with the rationale for that particular recommendation. The contents of this guide are intended for best practice recommendations and are not meant as a replacement for tuning resources based on specific usage profiles. Additionally, several of these recommendations trade off performance for flexibility in the virtualized environment.
General
Recommendation: Use the latest version of Virtual Hardware
The latest version of Virtual Hardware provides performance enhancements and maximums over older Virtual Hardware versions. Be aware that you may have to update the host, cluster or data center. For example, ESXi 7.0 introduces virtual hardware version 17, but VMs imported or migrated from older versions may not be automatically upgraded.
Recommendation: Use paravirtualized device drivers in the guest operating system
Paravirtualized hardware provides advanced queuing and processing off-loading features to maximize Virtual Machine performance. Additionally, paravirtualized drives provide batching of interrupts and requests to the physical hardware, which provides optimal performance for resource intensive operations.
VMware Tools provides guest OS drivers for paravirtual devices that optimize the interaction with VMkernel and offload potentially processor-intensive tasks such packet segmentation.
Recommendation: Disable VMWare Daemon Time Synchronization of the Virtual Machine
By default the VMWare daemon will synchronize the Guest OS to the Host OS (Hypervisor) once per minute, and may interfere with ntpdor chronyd settings. Through the VMSphere Admin UI, you can disable time synchronization between the Guest OS and Host OS in the virtual machine settings.
VMWare Docs: Configuring Virtual Machine Options
Recommendation: Disable Time Synchronization during VMWare operations
Even when daemon time synchronization is disabled, time synchronization will still occur during some VMWare operations such as, Guest OS boots/reboots, resuming a virtual machine, among others. Disabling VMWare clock sync completely requires editing the .vmx for the virtual machine to set several synchronization properties to false. Details can be found in the following VMWare Blog:
VMWare Blog: Completely Disable Time Synchronization for your VM
Recommendation: Use the noop scheduler for VMWare instances rather than deadline
The NOOP scheduler is a simple FIFO queue and uses the minimal amount of CPU/instructions per I/O to accomplish the basic merging and sorting functionality to complete the I/O.
Red Hat KB: IO Scheduler Recommendations for Virtualized Hosts
Recommendation: Remove any unused virtual hardware devices
Each virtual hardware device (Floppy disks, CD/DVD drives, COM/LPT ports) assigned to a VM requires interrupts on the physical CPU; reducing the number of unnecessary interrupts reduces the overhead associated with a VM.
Processor
Socket and Core Allocation
Recommendation: Only allocate enough vCPUs for the expected server load, keeping in mind the general recommendation is to maintain two cores per forest.
Rationale: Context switching between physical CPUs for instruction execution on virtual CPUs creates overhead in the hypervisor.
Recommendation: Avoid oversubscription of physical CPU resources on hosts with MarkLogic Virtual Machines. Ensure proper accounting for hypervisor overhead, including interrupt operations, storage network operations, and monitoring overhead, when allocating vCPUs.
Rationale: Oversubscription of physical CPUs can cause contention for process intensive operations on in MarkLogic. Properly accounting will ensure adequate CPU resources are available for both the hypervisor and any MarkLogic Virtual Machines.
Memory
General
Recommendation: Set up memory reservations for MarkLogic Virtual Machines.
Rationale: Memory reservations guarantee the availability of Virtual Machine memory when leveraging advanced vSphere functions such as Dynamic Resource Scheduling. Creating a reservation reduces the likelihood that MarkLogic Virtual Machines will be vMotioned to an ESX host with insufficient memory resources.
Recommendation: Avoid combining MarkLogic Virtual Machines with other types of Virtual Machines.
Rationale: Overcommitting memory on a single ESX host can result in swapping, causing significant performance overhead. Additionally, memory optimization techniques in the hypervisor, such as Transparent Page Sharing, rely on Virtual Machines running the same operating systems and processes.
Swapping Optimizations
Recommendation: Configure VM swap space to leverage host cache when available.
Rationale: During swapping, leveraging the ESXi hosts local SSD for swap will likely be substantially faster than using shared storage. This is unavailable when running a Fault Tolerant VM or using vMotion, but will provide a performance improvement for VMs in an HA cluster.
Huge / Large Pages
Recommendation: Configure Huge Pages in the guest operating system for Virtual Machines.
Rationale: Configuring Huge Pages in the guest operating system for a Virtual Machine prioritizes swapping of other memory first.
Recommendation: Disable Transparent Huge Pages in Linux kernels.
Rationale: The transparent Huge Page implementation in the Linux kernel includes functionality that provides compaction. Compaction operations are system level processes that are resource intensive, potentially causing resource starvation to the MarkLogic process. Using static Huge Pages is the preferred memory configuration for several high performance database platforms including MarkLogic Server.
Disk
General
Recommendation: Use Storage IO Control (SIOC) to prioritize MarkLogic VM disk IO.
Rationale: Several operations within MarkLogic require prioritized, consistent access to disk IO for consistent operation. Implementing a SIOC policy will help guarantee consistent performance when resources are contentious across multiple VMs accessing disk over shared links.
Recommendation: When possible, store VMDKs with MarkLogic forests on separate aggregates and LUNs.
Rationale: Storing data on separate aggregates and LUNs will reduce disk seek latency when IO intensive operations are taking place – for instance multiple hosts merging simultaneously.
Disk Provisioning
Recommendation: Use Thick Provisioning for MarkLogic data devices
Rationale: Thick provisioning prevents oversubscription of disk resources. This will also prevent any issues where the storage appliance does not automatically reclaim free space, which can cause writes to a LUN to fail.
NetAPP Data ONTAP Discussion on Free Space with VMWare
SCSI Adapter Configuration
Recommendation: Allocate a SCSI adapter for guest operating system files and database storage independently. Additionally, add a storage adapter per tier of storage being used when configuring MarkLogic (i.e., an isolated adapter with a virtual disk for fast data directory).
Rationale: Leveraging two SCSI adapters provides additional queuing capacity for high IOPS virtual machines. Isolating IO also allows tuning of data sources to meet specific application demands.
Recommendation: Use paravirtualized SCSI controllers in Virtual Machines.
Rationale: Paravirtualized SCSI controllers reduce management overhead associated with operation queuing.
Virtual Disks versus Raw Device Mappings
Recommendation: Use Virtual Disks rather than Raw Device Mappings.
Rationale: VMFS provides optimized block alignment for virtual machines. Ensuring that MarkLogic VMs are placed on VMFS volumes with sufficient IO throughput and dedicated physical storage reduces management complexity while optimizing performance.
Multipathing
Recommendation: Use round robin multipathing for iSCSI, FCoE, and Fibre Channel LUNs.
Rationale: Multipathing allows the usage of multiple storage network links; using round robin ensures that all available paths will be used, reducing the possibility of storage network saturation.
vSphere Flash Read Cache
Recommendation: Enabling vSphere Flash Read Cache can enhance database performance. When possible, a Cache Size of 20% of the total database size should be configured.
Rationale: vSphere Flash Read Cache provides read caching for commonly accessed blocks. MarkLogic can take advantage of localized read cache for many operations including term list index resolution. Additionally, offloading read requests from the backing storage array reduces contention for write operations.
Network
General
Recommendation: Use a dedicated physical NIC for MarkLogic cluster communications and a separate NIC for application communications. If multiple NICs are unavailable, use separate VLANs for cluster and application communication.
Rationale: Separating communications ensures optimal bandwidth is available for cluster communications while spreading the networking workload across multiple CPUs.
Recommendation: Use dedicated physical NICs for vMotion traffic on ESXi hosts running MarkLogic. If additional physical NICs are unavailable, move vMotion traffic to a separate VLAN.
Rationale: Separating vMotion traffic onto separate physical NICs, or at the very least a VLAN, reduces overall network congestion while providing optimal bandwidth for cluster communications. Additionally, NIOC policies can be configured to ensure resource shares are provided where necessary.
Recommendation: Use dedicated physical NICs for IP storage if possible. If additional physical NICs are unavailable, move IP storage traffic to a separate VLAN.
Rationale: Separating IP storage traffic onto separate physical NICs, or at the very least a VLAN, reduces overall network congestion while providing optimal bandwidth for cluster communications. Additionally, NIOC policies can be configured to ensure resource shares are provided where necessary.
Recommendation: Use Network I/O Control (NIOC) to prioritize MarkLogic inter-cluster communication
traffic.
Rationale: Since MarkLogic is a shared-nothing architecture, guaranteeing consistent network communication between nodes in the cluster provides consistent and optimal performance.
Network Adapter Configuration
Recommendation: Use enhanced vmxnet virtual network adapters in Virtual Machines.
Rationale: Enhanced vmxnet virtual network adapters can leverage both Jumbo Frames and TCP Segmentation Offloading to improve performance. Jumbo Frames allow for an increased MTU, reducing TCP header transmission overhead and CPU load. TCP Segmentation Offloading allows packets up to 64KB to be passed to the physical NIC for segmentation into smaller TCP packets, reducing CPU overhead and improving throughput.
Jumbo Frames
Recommendation: Use jumbo frames with MarkLogic Virtual Machines, ensuring that all components of the physical network support jumbo frames and are configured with an MTU of 9000.
Rationale: Jumbo frames increase the payload size of each TCP/IP frame sent across a network. As a result, the number of packets required to send a set of data is reduced, reducing overhead associated with the header of a TCP/IP frame. Jumbo frames are advantageous for optimizing the utilization of the physical network. However, if any components of the physical network do not support jumbo frames or are misconfigured, large frames are broken up and fragmented causing excessive overhead.
Analyzing Resource Contention
Processor Contention
Virtual Machines with CPU utilization above 90% and CPU ready metrics of 20% or higher, and really any CPU ready time, are contentious for CPU.
Key metric for processor contention is %RDY.
Memory Contention
Metrics for memory contention requires an understanding of VMware memory management techniques.
. Transparent Page Sharing
. Enabled by default in the hypervisor
. Deduplicates memory pages between virtual machines on a single host
. Balloon Driver
. Leverages the guest operating systems internal swapping mechanisms
. Implemented as a high-priority system process that balloons to consume memory, forcing the operating system to swap older pages
. Indicated by the MEMCTL/VMMEMCTL metric
. Memory Page Compression
. Only enabled when memory becomes constrained on the host
. Breaks large pages into smaller pages then compresses the smaller pages
. Can generate up to a 2:1 compression ratio
. Increases processor load during reading and writing pages
. Indicated by the ZIP metric
. Swapping
. Hypervisor level swapping
. Swapping usually happens to the vswp file allocated for the VM
. Storage can be with the VM
. Storage can be in a custom area, local disk on the ESXi host for instance
. Can use SSD Host Cache for faster swapping, but still very bad
. Indicated by the SWAP metric
Free memory metrics less than 6% or memory utilization above 94% indicate VM memory contention.
Disk Contention
Disk contention exists if the value for kernelLatency exceeds 4ms or deviceLatency exceeds 15ms. Device latencies greater than 15ms indicate an issue with the storage array, potentially an oversubscription of LUNs being used by VMFS or RDMs on the VM, or a misconfiguration in the storage processor. Additionally, a
queueLength counter greater than zero may indicate a less than optimal queue size set for an HBA or queuing on the storage array.
Network Contention
Dropped packets, indicated by the droppedTx and droppedRx metrics, are usually a good sign of a network bottleneck or misconfiguration.
High latency for a virtual switch configured with load balancing can indicate a misconfiguration in the selected load balancing algorithm. Particularly if using the IP-hash balancing algorithm, check the switch to ensure all ports on the switch are configured for EtherChannel or 802.3ad. High latency may also indicate a
misconfiguration of jumbo frames somewhere along the network path. Ensure all devices in the network have jumbo frames enabled.
References
VMware Performance Best Practices for vSphere 5.5 - http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.5.pdf
VMware Resource Management - http://pubs.vmware.com/vsphere-51/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-51-resource-management-guide.pdf