Solutions

MarkLogic Data Hub Service

Fast data integration + improved data governance and security, with no infrastructure to buy or manage.

Learn More

Learn

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Community

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

Company

Stay On Top Of Everything MarkLogic

Be the first to know! News, product information, and events delivered straight to your inbox.

Sign Me Up

 
Knowledgebase:
File system errors during backup to NFS mounted drive and recommended NFS mount options
07 July 2015 01:43 PM

Summary

There are situations where the SVC-DIRREM, SVC-DIROPEN and SVC-FILRD errors occur on backups to an NFS mounted drive. This article explains how this condition can occur and describes a number of recommendations to avoid such errors.

Under normal operating conditions, with proper mounting options for a remote drive, MarkLogic Server does not expect to report SVC-xxxx errors.  Most likely, these errors are a result of improper nfs disk mounting or other IO issues.

We will begin by exploring methods to narrow down the server which has the disk issue and then list some things to look into in order to identify the cause.

Error Log and Sys Log Observation

The following errors are typical MarkLogic Error Log entries seen during an NFS Backup that indicate an IO subsystem error.   The System Log files may include similar messages.

        Error: SVC-DIRREM: Directory removal error: rmdir '/Backup/directory/path': {OS level error message}

        Error: SVC-DIROPEN: Directory open error: opendir '/Backup/directory/path': {OS level error message}

        Error: Backup of forest 'forest-name' to 'Bakup path' SVC-FILRD: File read error: open '/Backup/directory/path': {OS level error message}

These SVC- error messages include the {OS level error message} retrieved from the underlying OS platform using generic C runtime strerror() system call.  These messages are typically something like "Stale NFS file handle" or "No such file or directory".

If only a subset of hosts in the cluster are generating these types of errrors ...

You should compare the problem host's NFS configuration with rest of the hosts in the cluster to make sure all of the configurations are consistent.

  • Compare nfs versions (rpm -qa | grep -i nfs)
  • Compare nfs configurations (mount -l -t nfs, cat /etc/mtab, nfsstat)
  • Compare platform version (uname -mrs, lsb_release -a) 

NFS mount options 

MarkLogic recommends the NFS Mount settings - 'rw,bg,hard,nointr,noac,tcp,vers=3,timeo=300,rsize=32768,wsize=32768,actimeo=0'

  • Vers=3 :  Must have NFS client version v3 or above
  • TCP : NFS must be configured to use TCP instead of default UDP
  • NOAC : To improve performance, NFS clients cache file attributes. Every few seconds, an NFS client checks the server's version of each file's attributes for updates. Changes that occur on the server in those small intervals remain undetected until the client checks the server again. The noac option prevents clients from caching file attributes so that applications can more quickly detect file changes on the server.
    • In addition to preventing the client from caching file attributes, the noac option forces application writes to become synchronous so that local changes to a file become visible on the server immediately. That way, other clients can quickly detect recent writes when they check the file's attributes.
    • Using the noac option provides greater cache coherence among NFS clients accessing the same files, but it extracts a significant performance penalty. As such, judicious use of file locking is encouraged instead. The DATA AND METADATA COHERENCE section contains a detailed discussion of these trade-offs.
    • NOTE: The noac option is a combination of the generic option sync, and the NFS-specific option actimeo=0.
  • ACTIME=0 : Using actimeo sets all of acregminacregmaxacdirmin, and acdirmax to the same "0" value. If this option is not specified, the NFS client uses the defaults for each of these options listed above.
  • NOINTR : Selects whether to allow signals to interrupt file operations on this mount point. If neither option is specified (or if nointr is specified), signals do not interrupt NFS file operations. If intr is specified, system calls return EINTR if an in-progress NFS operation is interrupted by a signal.
    • Using the intr option is preferred to using the soft option because it is significantly less likely to result in data corruption.
    • The intr / nointr mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels.
  • BG : If the bg option is specified, a timeout or failure causes the mount command to fork a child which continues to attempt to mount the export. The parent immediately returns with a zero exit code. This is known as a "background" mount.
  • HARD (vs soft) : Determines the recovery behavior of the NFS client after an NFS request times out. If neither option is specified (or if the hard option is specified), NFS requests are retried indefinitely. If the soft option is specified, then the NFS client fails an NFS request after retrans retransmissions have been sent, causing the NFS client to return an error to the calling application.
    • Note: A so-called "soft" timeout can cause silent data corruption in certain cases. As such, use the soft option only when client responsiveness is more important than data integrity. Using NFS over TCP or increasing the value of the retrans option may mitigate some of the risks of using the soft option. 

Issue persists => Further debugging 

If after checking NFS configuration and after implementing the MarkLogic recommended NFS mount settings, the issue persists, then you will need to debug the NFS connection during an issue period.    You should enable rpcdebug for NFS on the hosts showing the NFS errors, and then analyze the resulting syslogs during a period that is experiencing the issues

        rpcdebug -m nfs -s all

 The resulting logs may give you additional information to help understand what the source of the failures are.

 

(2 vote(s))
Helpful
Not helpful

Comments (0)