27 October 2015 03:58 PM
The performance and resource consumption of E-nodes is determined by the kind of queries executed in addtion to the distribution and amount of data. For example, if there are 4 forests in the cluster and the query is asking for only the top-10 results, then the E-node would receive a total of 4 x 10 results in order to determine the top-10 among these 40. If there are 8 forests, then the E-node would have to sort through 8 x 10 results.
Performance Test for Sizing E-Nodes:
To size E-nodes, it’s best to determine first how much workload a single E-node can handle, and then scale up accordingly.
Set up your performance test so it is at scale and so that it only talks to a single E-node. Start the Application Server settings with something like
Crank up the number of threads for the test from low to high, and observe the amount of resources being used on the E-node (cpu, memory, network). Measure both response time and throughput during these tests.
As you increase the number of threads, you will eventually run out of resources on the E-node - most likely memory. The idea is to identify the number of active threads when the system's memory is exceeded, because that is the maximum number of threads that your E-node can handle.
Addtitional Tuning of E-nodes
As you continue to decrease the thread count and make other adjustments, the mean time to failure will likely increase until the settings are such that equilibrium is reached before all the memory resources are consumed - at which time we do not expect to see any additional memory failures.
Swap, RAM & Cache for E-nodes
Growing your Cluster
As your application, data and usage changes over time, it is important to periodically revisit your cluster sizings and re-run your performance tests.