Elasticsearch sniffing best practices: What, when, why, how

Elasticsearch powers search experiences for so many tools and apps used today, from operational analytics dashboards to maps showing the closest restaurants with patios so you can get out of the house. And in all of those implementations, the connection between application and cluster is made via an Elasticsearch client

Optimizing the connection between the client and the Elasticsearch cluster is extremely important for the end user’s experience. The typical configuration of an Elasticsearch client is the URL of the node you must connect to. But there is much more you can do, and one way to optimize this connection is sniffing

Here’s how sniffing works, when you should use it, and how to know when you should avoid it.

What is sniffing?

Elasticsearch is a distributed system, which means its indices live in multiple nodes connected to each other, forming a cluster. One of the main advantages of being a distributed system — other than fault tolerance — is data is sharded into multiple nodes, allowing searches to run much faster than searches run through a huge single node.

A typical client configuration is a single URL that points to one node of the Elasticsearch cluster. While this is the simplest configuration, the main disadvantage of this setup is all of the requests you make will be sent to that specific coordination node. Since this puts a single node under stress, overall performance may be affected.

One solution is to pass a static list of nodes to the client, so your requests will be equally distributed among the nodes. 

Or you can enable a feature called sniffing.

With a static list of nodes, there’s no guarantee that the nodes will always be up and running. For example, what happens if you take a node down to upgrade — or you add new nodes?

If you enable sniffing, the client will start calling the _nodes/_all/http endpoint, and the response will be a list of all the nodes that are present in the cluster along with their IP addresses. Then the client will update its connection pool to use all of the new nodes and keep the state of the cluster in sync with the client’s connection pool. Note that even if the clients download the full list of nodes, the master-only nodes will not be used for generic API calls. 

Sniffing solves this discovery issue. So why isn’t it enabled by default? Great question!

When to sniff?

Sniffing can be a double-edged sword. If you try to call the  _nodes/_all/http endpoint, you’ll see a list of nodes and their respective endpoints. But there are a couple of questions to consider:

  • What happens if your Elasticsearch cluster lives in its own network? 
  • What if your Elasticsearch cluster lives behind a load balancer?

The short answer to both: you’ll get completely useless IP addresses because you’re in a different network.

You can try this by yourself with Docker. Spin up an Elasticsearch instance (one is enough) and call _nodes/_all/http from your local machine. You’ll see the IP address of your node won’t be the same IP address you just used.

Use with the following command to boot an Elasticsearch instance:

docker run  
  -p 9200:9200  
  -e "discovery.type=single-node"  
  docker.elastic.co/elasticsearch/elasticsearch:7.8.0

You can now read the node IP with the following command. (In the following snippet we’re using jq to make it easier to read the response.):

curl -s localhost:9200/_nodes/_all/http | jq .nodes[].http.publish_address

Finally, you can copy the IP address printed in the terminal and try to send a request to it:

curl {ip_address}:9200 | jq .

As you can see, you won’t get a successful response.

This means if you enable sniffing in a client while the cluster sits in another network, the client will add all the new nodes to its connection pool. That’s because it has no way to understand those IP addresses are wrong, and every query against one of those nodes will fail.

Since the initial node with the correct IP address is no longer present in the cluster state, it’ll be discarded, and you’ll get a “no living connections” error very quickly.

But we can fix that.

To resolve this problem, you can configure Elasticsearch to bind to its host but advertise another. The http.publish_host configuration option does exactly this. Now try to run the Docker command above with this new configuration:

docker run  
  -p 9200:9200  
  -e "discovery.type=single-node"  
  -e "http.publish_host=localhost"  
  docker.elastic.co/elasticsearch/elasticsearch:7.8.0

Now, if you run the following command again:

curl -s localhost:9200/_nodes/_all/http | jq .nodes[].http.publish_address

In the terminal you’ll see:

"localhost/{ip_address}:9200"

If you configure the publish host, then the official clients (from v7 and above) are smart enough to use the host address instead of the IP.

The main takeaway from this is you should know your infrastructure before you enable sniffing. There are many solutions to this IP address issue, and there is no silver bullet, because it all depends on your system configuration.

The typical development setup is to have the Elasticsearch cluster in the same network as your client, but this can’t be replicated in the real world since it would lead to security issues — and your infrastructure is likely more complex. You could configure the load balancer to handle those IP addresses. Or, as Elastic does in Elastic Cloud, you can let the proxy handle failing nodes so the client will always send the queries to the proxy, which will then send them to the appropriate node.

When not to sniff?

There are many situations where sniffing could cause some issues, including:

  • The Elasticsearch user your client is authenticating with doesn’t have the right permissions (monitoring_user role) to access the nodes API.
  • You’re working with cloud providers. 

Usually, cloud providers hide Elasticsearch behind a proxy, which would make the sniffing operation useless since the addresses and hostnames returned may have no meaning in your network. Typically, those cloud providers handle the sniffing and pooling complexity for you, so you don’t need to enable those. 

If you’re using Elastic Cloud, the official clients will short-circuit most operations internally, such as the connection pool handling, to avoid spending time on operations that have already been done.

Other issues, as we saw before, can occur when working with Docker or Kubernetes. Unless you configure the publish host option, the sniffing result will be unusable.

As a rule of thumb: If Elasticsearch lives in a different network from your client — or there is a load balancer — sniffing should be disabled unless there’s some configuration in the infrastructure that allows you to use it accurately.

How to sniff?

Clients offer multiple sniffing strategies. Let’s analyze them:

Sniffing at startup

As the name suggests, when you enable this option, the client will attempt to execute a sniff request one time only during the client initialization or first usage.

Sniffing on connection failure

If you enable this option, the client will attempt to execute a sniff request every time a node is faulty, which means a broken connection or a dead node.

Sniff interval

In addition to sniffing on startup and sniffing on failures, sniffing periodically can benefit scenarios where clusters are often scaled horizontally during peak hours. An application might have a healthy view of a subset of the nodes. But without sniffing periodically, it’ll never find the nodes that have been added as part of horizontal scaling.

Custom configurations

In some cases, you may want to have more fine-grained control over the sniffing procedure. The clients are flexible enough to allow you to configure a custom sniffing endpoint, or you can override the sniffing logic entirely and provide one of your own.

Conclusions

When you enable sniffing, you’ll make your application more resilient and able to adapt to changes. Before doing so, you should know your infrastructure so you can decide what the best solution to adopt is. The best solution might even be to not adopt sniffing.

If you’d like to avoid thinking about sniffing and connection pool configuration and instead use a simple connection string, give Elastic Cloud a try with a free 14-day trial of our Elasticsearch Service

Source: Elastic