How to handle Socket receive errors from a Non-Recognized IP Address
28 July 2015 06:20 PM


Sometimes, when a host is removed from a cluster in an improper manner -- e.g., by some means other than the Admin UI or Admin API, a remote host can still try to communicate with its old cluster, but the cluster will recognize it as a "foreign IP" and will log a message like the one below:

2014-12-16 00:00:20.228 Warning: XDQPServerConnection::init( SVC-SOCRECV: Socket receive error: wait Timeout


XDQP is the internal protocol that MarkLogic uses for internal communications amongst the hosts in a cluster and it uses port 7999 by default. In this message, the local host is receiveng socket connections from foreign host


Debugging Procedure, Step 1

To find out if this message indicates a socket connection from an IP address that is not part of the cluster, the first place is to look is in the hosts.xml files. If the IP address in not found in the hosts.xml, then it is a foreign IP. In that case, the following are the steps will help to identify the the processes that are listening on port 7999.


Debugging Procedure, Step 2

To find out who is listening on XDQP ports, try running the following command in a shell window on each host:

      $ sudo netstat -tulpn | grep 7999

You should only see MarkLogic as a listner:

     tcp 0 0* LISTEN 1605/MarkLogic

If you see any other process listening on 7999, yopu have found your culprit. Shot down those processes and the messages will go away.


Debugging Procedure, Step 3

If the issue persists, run tcpdump to trace packets to/from "foreign" hosts using the following command:

     tcpdump -n host {unrecognized IP}

Shutdown MarkLogic on those hosts. Also, shutdown any other applications that are using port 7999.


Debugging Procedure, Step 4

If the cluster are hosts on AWS, you may also want to check on your Elastic Load Balancer ports. This may be tricky, because instances will change IP addresses if they are rebooted, so  work with AWS Support to help you find the AMI or load balancer instance that is pinging your cluster.

In the case that the "foreign host" is an elastic load balancer, be sure to remove port 7999 from its rotation/scheduler. In addition, you should set the load balancer to use port 7997 for the heartbeat functionality.

(0 vote(s))
Not helpful

Comments (0)