Developers tools

First steps with ElasticSearch

Samuel Useche

May 9th, 2017

What is ElasticSearch ?

 

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases.

 

Is based on Lucene developed in Java programming language as open source under the conditions of the Apache license. Its functionality is through a REST interface receiving and sending data in JSON format and hidden through this interface the internal details of lucene. This interface allows it to be used by any platform not only Java, it can be used from Python, .NET, PHP or even from a browser with Javascript. It is persistent, that is, what we index in it will survive a restart of the server.

 

ElasticSearch Vs NoSql databases

 

What about systems like Postgres for example, that comes with full-text search and ACID-transactions? (Other examples are the full-text capabilities of MySQL, MongoDB, Riak, etc.) While you can implement basic search with Postgres, there is a huge gap in both performance and in the features. Elasticsearch can “cheat” and a lot of caching, with no concern for multi-version concurrency control and other complicating things. Search is also more than finding a keyword in a piece of text: it’s about applying domain specific knowledge to implement good relevancy models, giving an overview of the entire result space, and doing things like spell checking and autocompletion. All while being fast.

 

Elasticsearch is commonly used in addition to another database. A database system with stronger focus on constraints, correctness and robustness, and on being readily and transactionally updatable, has the master record – which is then asynchronously pushed to Elasticsearch.

 

Some basic concepts within ElasticSearch

Cluster

A cluster is a set of one of the largest that hold all information in a distributed and indexed way. Each group is identified by a name, by default they are called “elasticsearch”.

 

Node

A node is a server that is part of a cluster, stores your information and helps with the cluster indexing and search tasks. Nodes are identified by a name as well, but in this case each node is named after a Marvel character.

 

By default they are configured to be part of a cluster with the name “elasticsearch”.

 

There can be as many nodes as you want for each Cluster, in case there is no Cluster configured at the time of creation it will create it and join it.

 

Index

 

An Index is a collection of documents that have similar characteristics. The indexes are identified by a name, which we will use when indexing, searching, updating and deleting.

 

ElasticSearch installation

 

The installation will be done in a linux environment as it is most commonly used on servers.

 

Add the Oracle Java PPA to apt:

sudo add-apt-repository -y ppa:webupd8team/java

Update your apt package database:

 

sudo apt-get update

Install the latest stable version of Oracle Java 8 with this command (and accept the license agreement that pops up):

 

sudo apt-get -y install oracle-java8-installer

Lastly, verify it is installed:

 

java -version

Download the latest Elasticsearch version, which is 2.3.1 at the time of writing.

wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/deb/elasticsearch/2.3.1/elasticsearch-2.3.1.deb

Then install it in the usual Ubuntu way with dpkg.

 

sudo dpkg -i elasticsearch-2.3.1.deb

To make sure Elasticsearch starts and stops automatically with the server, add its init script to the default runlevels.

 

sudo systemctl enable elasticsearch.service

Configuring Elasticsearch

To start editing the main elasticsearch.yml configuration file with nano or your favorite text editor.

 

sudo nano /etc/elasticsearch/elasticsearch.yml

Remove the # character at the beginning of the lines for cluster.name and node.name to uncomment them, and then update their values. Your first configuration changes in the /etc/elasticsearch/elasticsearch.yml file should look like this:

 

. . .
cluster.name: mycluster1
node.name: “My First Node”
. . .

 

Testing Elasticsearch

By now, Elasticsearch should be running on port 9200. You can test it with curl, the command line client-side URL transfers tool and a simple GET request.

 

curl -X GET ‘http://localhost:9200’

You should see the following response:

 

Output of curl

{
“name” : “My First Node”,
“cluster_name” : “mycluster1”,
“version” : {
“number” : “2.3.1”,
“build_hash” : “bd980929010aef404e7cb0843e61d0665269fc39”,
“build_timestamp” : “2016-04-04T12:25:05Z”,
“build_snapshot” : false,
“lucene_version” : “5.5.0”
},
“tagline” : “You Know, for Search”
}

 

Using Elasticsearch

 

To start using Elasticsearch, let’s add some data first. As already mentioned, Elasticsearch uses a RESTful API, which responds to the usual CRUD commands: create, read, update, and delete. For working with it, we’ll use again curl.

You can add your first entry with the command:

 

curl -X POST ‘http://localhost:9200/tutorial/helloworld/1’ -d ‘{ “message”: “Hello World!” }’

You should see the following response:

Output

 

{“_index”:”tutorial”,”_type”:”helloworld”,”_id”:”1″,”_version”:1,”_shards”:{“total”:2,”successful”:1,”failed”:0},”created”:true}

With cuel, we have sent an HTTP POST request to the Elasticsearch server. The URI of the request was /tutorial/helloworld/1 with several parameters:

 

You can retrieve this first entry with an HTTP GET request.

 

curl -X GET ‘http://localhost:9200/tutorial/helloworld/1’

The result should look like:

 

Output:

 

{“_index”:”tutorial”,”_type”:”helloworld”,”_id”:”1″,”_version”:1,”found”:true,”_source”:{ “message”: “Hello World!” }}

 

So far we have added to and queried data in Elasticsearch. To learn about the other operations please check the API documentation.

 

In conclusion the use of elasticSearch for the searches in large batches of data present us with many advantages such as:

 

 

References

Comments (1)

By admin

on May 12, 2017

test

Leave us a comment

RECENT POST
Citriom - All Rights Reserved