<img src="//bat.bing.com/action/0?ti=5739181&amp;Ver=2" height="0" width="0" style="display:none; visibility: hidden;">

DevOps

Get your Elasticsearch on: 5 conclusions

elk.jpg

ElasticSearch and the “ELK Stack”

 

ElasticSearch is a full-text search engine application based on Apache’s Lucene information retrieval library. It’s based on Java; it’s distributed on nature and completely RESTfull with JSON support.

While ElasticSearch can be used with many third party applications, it’s very common to use it as a part of the ELK Stack (ELK = ElasticSearch, Logstash, Kibana). Indexes can be sharded (distributed along multiple hosts) and each shard can have replicas too.

In the following lines we’ll explore common challenges and problems related to ElasticSearch and how to overcome them.

 

The right size for the right load

 

ElasticSearch can be used on a single node (even in your laptop) or as a part of a bigger deployment (think Hadoop and big-data). If you go big, you need to carefully design your node distribution, and specifically your shards. ElasticSearch indexes can be sharded, and shards can have replicas too. In the moment you create your indexes you can set how many shards it will contain, and distribute those shards along your servers and add replicas to the primary shards but there is a little catch: Once your index is created, you can’t change your shard numbers.

Then, sizing your sharded index is very important here. Also if you want to include high availability you need to add replicas to your shards distributed along many servers. Note that you can add as many replicas as you want, but primary shard number can’t be modified.

The challenge: Size your production cluster so it will withstand the load you expect and keep proper performance and high availability during all its lifecycle.

Solution: Size your cluster carefully. Take into account that shards can hold a maximum of data. Because shards are Lucene indexes, and each or those can hold a specific maximum quantity of documents, you need to setup your sharding configuration taking into account current limits and your desired workload capacity. Also remember sharding is used to allow you to distribute your load and makes you highly available if you add replicas to you shards. The most distributed your index is, the faster it will respond.

And some extra notes:

  • Adding to many shards is as bad as having too few. Again, set the right size for your indexes according to expected load and having system limitations on mind.
  • ElasticSearch is a RESTFull application. Adding a web-load balancer in front of the cluster is what you want to do for big setups. 

 

Mappings: What you SHOULD know

 

Mappings is a process in ElasticSearch used to define how a document and its fields are stored on the indexes. You can use mappings to define which fields are treated as “full-text”, which ones as numbers, dates, geo-location data, etc.

Mappings can be set both manually and dynamically, and you can also set rules for dynamic mappings. A warning here: If you allow dynamic mappings go without restriction, you are bound to suffer what is called a mappings explosion.

The challenge: Avoid mapping explosions!

Solution: Your task here is to correctly set your manual mappings, and add specific configurations (see ElasticSearch documentation) that will allow you to restrain your dynamic mappings and avoid any occurrence of mappings explosions in your system.

A mapping example next:

PUT my_index 
{
  "mappings": {
    "user": { 
      "_all":       { "enabled": false  }, 
      "properties": { 
        "title":    { "type": "text"  }, 
        "name":     { "type": "text"  }, 
        "age":      { "type": "integer" }  
      }
    },
    "blogpost": { 
      "_all":       { "enabled": false  }, 
      "properties": { 
        "title":    { "type": "text"  }, 
        "body":     { "type": "text"  }, 
        "user_id":  {
          "type":   "keyword" 
        },
        "created":  {
          "type":   "date", 
          "format": "strict_date_optional_time||epoch_millis"
        }
      }
    }
  }
}

 

 

Never trust defaults, especially for production systems

 

As much as happens when blindly following a wizard, letting your production ElasticSearch cluster to run on default settings is the best and fastest way to live with Mr. Murphy at your side for a long time. Defaults can be good for your testing node on your laptop, but they will hit you on clustered production systems.

The challenge: Set the proper configuration in order to keep your cluster from due generic configuration defaults.

Solution: See the documentation and specially see the flags that need to be changed for specific cluster sizes and configurations. Begin with changing your default cluster name (ElasticSearch) to something more production-like. This will also disallow rogue nodes from joining your production cluster. See all settings that need to be adjusted for your expected load and cluster configuration and run some tests (stress tests) in order to fine tune things until you are happy with your production configuration.

It's very important to keep in mind that Documentation is the key here. Not only see the official documentation or things documented by others, but also documenting yourself what you find and why you set specific settings on your cluster. Human memory is unreliable, but a txt file with your notes is not. Also, you can script your installation and configuration (fully automating your setup) and keep that script on any git based solution.

Next a simple way to automate ElasticSearch install and configuration using common Linux tools:

#!/bin/bash

#

rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

#

cat<<EOF>/etc/yum.repos.d/elasticsearch.repo

[elasticsearch-5.x]

name=Elasticsearch repository for 5.x packages

baseurl=https://artifacts.elastic.co/packages/5.x/yum

gpgcheck=1

gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch

enabled=1

autorefresh=1

type=rpm-md

EOF

#

yum -y update

yum -y install elasticsearch

sed -r -i \

's/^#network.host.*/network.host:\ 127.0.0.1/g' \

/etc/elasticsearch/elasticsearch.yml

sed -r -i \

's/^#http.port.*/http.port:\ 9200/g' \

/etc/elasticsearch/elasticsearch.yml

systemctl start elasticsearch

systemctl enable elasticsearch

#

#END

#

 

 

Fatty index templates

 

You can create index templates that will be applied to new indexes at creation time. That will save you a lot of work but beware: Large templates are directly related to large mappings. If you set your templates too large and complex, you’ll suffer long debug and update times. Here is where the kiss principle helps (https://en.wikipedia.org/wiki/KISS_principle).

The challenge: Keep your templates simple to maintain, diagnose and improve.

Solutions: The first solution is, of course, is to keep your template on a carb-free diet (not really) and a simple maintenance and improvements are easier to do. A Second solution can be to Use dynamic templates. A dynamic template allows defining custom mappings that can be applied to dynamically added fields.

A template example:

PUT _template/template_1
{
  "template": "te*",
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "type1": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z YYYY"
        }
      }
    }
  }
}

 

A dynamic template example:

PUT my_index
{
  "mappings": {
    "my_type": {
      "dynamic_templates": [
        {
          "strings_as_keywords": {
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }
      ]
    }
  }
}

 

 

Watch out for combinatoric explosions!

 

This is another source of problems on ElasticSearch. Combinatorial explosions happens when aggregations gets nested on several levels, causing the creation of an uncontrollable quantity of buckets (even at exponential level) and finally exhausting all available memory on the server.

The challenge: Avoid combinatorial explosions.

Solution: Set the proper collection modes. ElasticSearch collection modes are used to control how a child aggregation performs and how data is collected during a search.

See the following example (taken from ElasticSearch documentation):

{
  "aggs" : {
    "actors" : {
      "terms" : {
         "field" : "actors",
         "size" :  10
      },
      "aggs" : {
        "costars" : {
          "terms" : {
            "field" : "actors",
            "size" :  5
          }
        }
      }
    }
  }
}

 

Running this example (as explained by elastic search documentation) on actors, co-stars for each actor, and co-stars for each co-star will create so many field buckets that your RAM will die on short term, especially if you have a lot of data.

Now, see the same example using a collection mode:

{
  "aggs" : {
    "actors" : {
      "terms" : {
         "field" :        "actors",
         "size" :         10,
         "collect_mode" : "breadth_first" 
      },
      "aggs" : {
        "costars" : {
          "terms" : {
            "field" : "actors",
            "size" :  5
          }
        }
      }
    }
  }
}

 

Using the breadth first collection mode in this example will build and trim the data tree one level at the time in order to keep the node memory usage controlled and avoid the dreaded combinatorial explosion.

 

Final words on ElasticSearch and a few words of caution

ElasticSearch is an excellent program but is not easy to setup specially for complex situations. In all the challenges presented on this article the main key is “documenting yourself” on how things works inside ElasticSearch and how to make things work as you expect. Knowing the most common problems and how to avoid them is the first step. Knowing best installation and configuration practices is the second step but not less important than the first one. Finally, knowing your data and how to improve the way ElasticSearch will handle it is the final step to reach the proper setup for you.

So here's what you can do: Document yourself on how things work, configure your system accordingly, and fine-tune the way your data is being handled inside the system. And, as always, never forget a towel!

 

 

 

 

Loom Systems delivers an AIOps-powered log analytics solution, Sophie, to predict and prevent problems in the digital business. Loom collects logs and metrics from the entire IT stack, continually monitors them, and gives a heads-up when something is likely to deviate from the norm. When it does, Loom sends out an alert and recommended resolution so DevOps and IT managers can proactively attend to the issue before anything goes down.
Get Started with AIOps Today!

 

New Call-to-action

Measure ROI from IT Operations Tools

 

 

New Call-to-action

Gain Visibility into Your OpenStack Logs with AI

 

 

New Call-to-action

Lead a Successful Digital Transformation Through IT Operations

 

Looking for more posts like this?