17 February 2016 01:17 PM
There are situations where the XDMP-BACKDIRSPACE error occurs while backing up a database. This article explains how this condition can occur and describes a number of strategies to troubleshoot and to determine root cause.
Under normal operating conditions, when there is enough disk space to complete a backup, MarkLogic Server does not expect to report the XDMP_BACKDIRSPACE error. Most likely, this error is a result of a bad disk configuration, of the disk unmounting or of, simply, insufficient disk space.
We will begin by exploring methods to narrow down the server which has disk issue and then list some thing to look into in order to identify the cause.
How Administrators Can Narrow down the particular node out of cluster.
XDMP-BACKDIRSPACE error indicates that a host or hosts in a cluster does not have sufficient space to complete a backup operation. Because MarkLogic Server implements a shared-nothing architecture, a database backup operation results in each MarkLogic Server that is hosting forests for the database will attempt to backup their related forests at the specified path as seen by that server. If any of those forests fails to be backed up because of insufficient disk space, the entire database backup operation will fail with XDMP_BACKDIRSPACE error.
If the backup was executed from the Admin UI, the error will be reported in the Admin UI. However, the host where the error actually occured in not reported.
To identify the node with that is reporting insufficient disk space, you need to look at all of the MarkLogic Server ErrorLogs of all hosts which are mounting forests for that database. The XDMP_BACKDIRSPACE error will be logged in the ErrorLog of the host where it occurred. Once the problem node is identified, we could do below checks to make sure if we have sufficient disk space.
Things to look at on trouble node
1) Free i-nodes and free disk space
The server could have enough free disk space, but if your linux reached configued i-nodes limit, the server would appear to be out of disk space. The "df -hi" command can tell you if you have free i-nodes. If you are i-node constrained, configuring more i-nodes.
2) Disk mount Errors
There may be network problems, resulting in the remote disk unmounting frequently. Looking for disk mounting related Error in /var/opt/messages (Linux) or System Log (Windows).
It is also possible that you are using non-standard or unreliable mount options. Different remote file system have different mount option recommendations. Verify that your mount option are sufficient for the workload.
3) Is your host running on VM ?
Many Virtualized Machine environments provides memory and disk to guest OS as needed. This type of configuration is the source of problems for many resource intensive application. In general, need to configure your VM host to have fixed/pre-assigned memory, disk, cpu and network resources .
4) Configuration comparision among Nodes.
If there is no apparent free disk space issue, you should compare disk configurations between the problem node with other nodes in the cluster using with "fdisk -l", "cat /etc/fstab" and "mount" commands.
5) Corrupt sector/block ?
Check disk health. chkdsk (Windows) or fsck (Linux) can be used to check the disk for Bad Sector and Blocks.
6) Disk I/O hardware ?
Check disk I/O. You could utilize Windows System Monitoring tool or on Linux you could use "iostat", "dstat" or inspect platforms proc disk stat files, and Sar files for disk io health.
7) Privilege issue ?
if MarkLogic Server is running as non-root user, check the file-system privilege of all mounted drive accessed by the MarkLogic Server process.