Elasticsearch (ES) – Vol.1

Miloš Joković
Miloš Joković
February 21, 2017

Guide through installation and configuration of Elasticsearch (ES) – Vol.1

Meet the Elasticsearch (ES) – open source server, one of the most popular search engines which is characterized by outstanding performances, search in near real-time and easy installation and configuration. There is no time to waste – lay back and enjoy discovering why this browser is used by companies such as LinkedIn, Wikipedia, Target, eBay and WordPress!

What is the Elasticsearch (ES)?

Elasticsearch (ES) is an open source server which is based on Lucene, developed in Java programming language and released as an open source project under the terms of the Apache license. Elasticsearch provides a distributed, full-text search engine with the HTTP web interface which is optimized for text search. ES server has RESTfull API and communicates with the ES server via JSON format.

It is easily configured and has many search options, while optimized text search includes  search of specific data, initial letters of keywords and/or indexing.

ES is one of the most popular search engines used by the giants such as LinkedInWikipediaTargeteBayWordPress and others.

Why should you use Elasticsearch (ES)?

Elasticsearch (ES) server can be used for search of all types of documents and provides for scalable search. ES is a Near Real Time (NRT) browser – it allows you to search in a near real time, as it takes only 1 second from the moment of the data entry to the data indexing through a distributed ES architecture. As ES is distributing, that means that the indexes can be divided into several  shards. Each index can have one or more replicas, and each node is the host of one or more files containing the data.

We communicate with ES via ES JSON RESTFullAPI.

The reasons for using Elasticsearch are the following:

  • excellent search performances
  • scalability
  • denormalized data storage
  • replacement for MongoDB or RavenDB.

It is not uncommon that in process of developing applications, we need to read from multiple tables and merge data in order to get the desired information. Then SQL query looks like this:

select   t1.name

, t2.price

, t3.adress

, t4.description

from table1 as t1

         inner join table2 as t2 on t1.ID = t2.ID

         inner join table3 as t3 on t2.ID = t3.ID

         inner join table4 as t4 on t3.ID = t4.ID

The query may take longer depending on the size of the tables involved in the query, which has negative impact on performances. By converting the results in JSON format, we create documents that can be indexed with the ES.

Example of the results in JSON format:

{

    ‘name’:  ‘…’

  , ‘price’:  ‘…’

  , ‘adress’:  ‘…’

  , ‘description’:  ‘…’   

}

ES has excellent performances, because it conducts the search of documents in the already existing framework which contains all necessary information;  hence, it is not required to execute SQL query every time you start the search. In this way we avoid waiting for the SQL query execution, which makes ES performance even better.

How does the installation and configuration of Elasticsearch (ES)  work?

ES server can be downloaded from the official Elastic website To run the ES server it is enough to extract the downloaded file and run a command or batch file through the terminal from the newly created folder (batch file for Windows OS: bin\elasticsearch.bat, command if you work on other OS: bin\elasticsearch).

If everything went successfully, you should see the following output in your terminal:

We stop ES server via command: Ctrl+C.

Note

In order to use the ES, it is necessary to have Java plugin installed which can be downloaded from the official Java website and set the JAVA_HOME variable.

Be sure to download the correct version of x86 or x64 plugin for your OS..

Now, when we launched the ES server, we can communicate via JSON (JavaScript Object Notation) RestAPI located on the localhost on port 9200.

How does the text search works?

ES memorize every word from the text as a term, and for each term it remembers document in which it appears, as well as its position in the document. Also, it memorizes the total number of repetitions of the term in all documents.


For documents search we use filters and queries.

The search can be carried out in several ways:

  • Query string:

We send parameter “q” through URL for search,we list index in URL and type of document we are searching for. This method is good for simple queries, but is not recommended for more complex queries.

Example of a request:

http://localhost:9200/ecommerce/product/_search?q=pasta

Example of a request if you want to search across the field name:

http://localhost:9200/ecommerce/product/_search?q=name:pasta

  • Query DSL:

We forward parameters for search through the “body request” in JSON format that is clearer and more flexible.

Example of request: http://localhost:9200/ecommerce/product/_search

{

    “query”:{

          “match”: {

                        “name”: “pasta”

            }

    }   

}

It is necessary to mention parameter _search when searching. Search fields are various, depending on what is specified in the request:

  • http://localhost:9200/_search – searching through all indexes and types of documents
  • http://localhost:9200/ime_indeksa/_search -searching through all types within the given index
  • http://localhost:9200/movies/movie/_search – searching for explicit documents that are of specified type.

For more information, see the official documentation.

Configuration of ES

The configuration file is located in the config folder and has the .yml extension. In our case, the file name is elasticsearch.yml.

Configuration file offers various options, some of which are:

  • cluster.name = testaplication (for example: if we want to change the default value elasticsearch)
  • node.master: true
  • node.data: true

For example, with the combination of parameters node.master and node.data we can set the node:

  • to be a master and keep the indexed data
  • not to be master, but only to save the data
  • to be master and not keep the data, but to have a role of the coordinator.

Although we have a single node, ES server will check if there are more in the cluster and waste resources. This can be avoided with configuration of parameters:

discovery.zen.ping.multicast.enabled: false

Nodes from the group may also be provided by using a parameter:

discovery.zen.ping.unicast.host:[“host1”,”host2”]

By calling: GET http://localhost:9200/_cluster/health we are checking the status of a cluster.

This was just a short introduction to Elasticsearch and the opportunities it provides. “Repetition is the mother of knowledge” in every case, especially when it comes to practical application of the acquired theoretical knowledge. For better work in this open source server, you need to become familiar with the specific terminology used in the ES, after which you’ll be ready to create your own calls.

Share
some love

They trust us, so should you

BE ONE OF THEM