Elasticsearch (ES) – Vol.2

Miloš Joković

March 1, 2017

After we introduced you with the Elasticsearch (ES) , defined its benefits and described the process of its installation and configuration, now it is time to atpractically apply the acquired theoretical knowledge. We hope that you mastered the previous material and that you’re ready for new lessons. :)

To be in the know with the latest and greatest from Elastic, we recommend adopting the specific terminology related to this open source server.

Cluster is a collection of nodes. The number of nodes is not limited. Together, all nodes contain all the information. Cluster has a unique name and a default value is ‘elasticsearch’.

Nodes also have a unique name and contain indexes that consist of shards which keep the indexed documents. When it comes to search within a cluster, the nodes cooperate with each other.

Indexes are collections of documents. Each document has its own type. Index names are unique and they are written in small letters. It is recommended that, in accordance with the content that we want to add to the index, we give the name of the index. Indexes are used for operations on documents, such as: add, update, search and delete.

Indexes define document types, and each type of index is composed of the names and mappings.

Mapping is run at the time of index creation and cannot be changed later. It describes the fields for the document type. If you skip the pre-mapping for the type of index when adding documents to the index, the ES will do the default mapping according to the document, ie. the data indexed. In the mapping, we can mention the name and type of the field. Field type can be: string, date, integer, object, etc.

When creating an index, you define the number of shards. By default, the index has 5 shards. Once created, the number of shards in the primary index cannot be changed. You should pay attention to shards values, because a larger value burdens the CPU.

For example, when you send a request for a search that should return 10 results and if the index we search has 20 shards to search, a search of each shard runs separately and returns 10 best results. In the next step, the results of special searches are grouped and the ES choses the top 10 results. As you can see, in this example where the index had 20 shards, there was no need to perform tasks and collect results which will later be dismissed as unnecessary.

Lucene, the ES base, has no index types. Index types are implementation of ES which adds a filter in the background on the document and type of index is written in the in meta field “_type” which describes the index.

A document is the basic unit that can be indexed. A document is a JSON object which contains zero or more fields, or key-value pairs. Values may be: string, date, object…

Node replica is is a copy of the primary shard. Copies are made to achieve higher performance. A replica shard will never be started on the same node as its primary shard. Because in case of failover, a replica shard could not be promoted to a primary shard. Having replicas means having more opportunities to read the same document or data. This is useful when there is a large number of requests in a short period of time for a single index that contains the required documents. You should note that a large number of replicas slows down the indexing process. Hence, when requests occur occasionally, it is not recommended to make a large number of replicas.

Nodes are the machines – servers and be careful to set the same version of ES on every node and pay attention to configuration files, because of synchronization of indexed documents. The name of a cluster is also an important parameter. If the several nodes are built on the same network and cluster.name parameter in the configuration file has the same value, it means that all nodes belong to the same cluster. If we added another node in our architecture, then ES would convey replicas in it, because it’s replicas are never run on the same node as primary shards. In most cases, one node is sufficient to meet all needs.

The examples of calls

Now you are fully prepared for the practical application of knowledge. We will use the Chrome browser with installed extension RESTED. RESTED extension is one of the plugins for REST client.

Explanation of replies:

pri – number of index files

rep – number of replicas

docs.count – number of documents

docs-deleted – number of deleted documents

healt – yellow – this value of parameter healt means that there are replicas that are not allocated. Specifically, when you create the ES index by default, a replica is also created. But since we have a single node, the allocation is enabled, only when we add more nodes to the group. When the index allocates replicas to another node, the value of the parameter healt becomes green.

Note

Adding index forwards, gives us greater control and the ability to configure and map the indexes unlike the case when we add a document and ES adds an index with the default settings, which cannot be subsequently changed.

You can see other examples of calls here.

Excellent search performance, scalability and denormalized data storage are just some of the great Elasticsearch features. The process of installation and configuration is pretty simple; yet, the benefits of ES are multiple. We hope that the information and practical examples we provided were useful. We invite you to seize all the advantages of this brilliant open source server.

some love