Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

ElasticSearch is an open source tool developed with Java. It is a Lucene-based, scalable, full-text search engine, and a data analysis tool.

A huge amount of data is produced at any moment in today’s world of information technology in social media, in video sharing sites, and in medium- and large-sized companies that provide services in communication, health, security and other areas. Here we are talking about an information/data ocean; we call this ocean big data in the world of information technology. A significant part of this data is unstructured, scattered and insignificant when it is alone.

For this reason, some requirements of this data are at stake, such as recording, accessing, analyzing and processing the data. Like similar search engines, ElasticSearch is a tool developed for dealing with the problems of big data mentioned above.

ElasticSearch is powerful and flexible, and being real-time and distributed are some of its biggest advantages. Today, ElasticSearch is used for content search, data analysis, and queries in projects such as Mozilla, Foursquare, GitHub.

In order to explore ElasticSearch, we will have a closer look at its basic features and concepts.

Full-Text Search

When the data stored in a database grows, speed/performance problems occur in query operations that are performed on the data. To remedy this, a method of indexing and cataloging the words in the text fields has been adopted. In this way, it is shown that databases respond faster and show better performance, even when working with large-scale data. Multi-language support of ElasticSearch provides powerful full-text search capabilities such as a powerful query language and auto-completion.

Index

ElasticSearch is a document-oriented search engine. Each record in ElasticSearch is a structured JSON document. In other words, data that is sent to ElasticSearch for indexing is a JSON document. All fields of the documents are indexed by default and can be used in a single query.

ElasticSearch indices, compared to database management systems, may be considered databases. As a database is a collection of regular information, ElasticSearch indices are collections of structured JSON documents.

Type

Types can be considered tables, again compared to database management systems. Indices may contain one or more types.

Mapping

Mapping is the process of defining how a document should be mapped to the search engine. Types are created according to the mapping information. ElasticSearch creates mapping automatically (explicit mapping) based on the data sent (for example, string, integer, double, boolean). You can override the default mapping by defining a new mapping.

RESTful API

ElasticSearch is driven by RESTful API. Almost every action can be performed with RESTful API by using JSON through HTTP.

How To Install?

Installation is composed of downloading the latest ElasticSearch distribution, unzipping, and running the executable file appropriate to your operating system.

For Unix systems:

bin/elasticsearch -f

For Windows:

bin/elasticsearch.bat

If the request is made from the terminal:

curl -X GET http://localhost:9200/

Or if the request
http://localhost:9200/ made from the browser gives an output like the following, it means that the service is running as expected and we are ready to work with ElasticSearch.

ElasticSearch is schema free. It does not request some definitions such as index, type and field type before the indexing process. When a record is added, ElasticSearch tries to identify the data structure and index, and make it searchable. If desired, index, type, field and field type definitions can be changed before or after the adding record.

It is important to understand the flexibility provided by ElasticSearch. Containing documents that have areas with different types, names and numbers in the same index is undoubtedly a plus. For example, it is a requirement to define field and field types in
Solr, another popular full-text search engine. When prompted to add a new field, it is necessary to transfer all records to the Solr again.
ElastichSearch does not have such restrictions, and this situation can be compared to table and column independence provided by NoSQL architectures.

Let’s experience this flexibility by adding our first record to ElasticSearch. I mentioned above that ElasticSearch is driven by RESTful API. For this reason, we will make the record-adding process by using
curl(client URL library).

As we can see, despite not creating an index called kodcucom and a type called article before the registration,
ElasticSearch made all of these in standard settings based on the added record.

The record (
JSON document) which has an ID value 1’s type is article and index is kodcucom.

Cluster

ElasticSearch has been built to
scale horizontally. If more capacity is needed, it is sufficient to increase the number of nodes. In this case, the cluster will reorganize itself in order to take advantage of extra hardware.

The standard ElasticSearch installations have the same cluster name and, regardless of the number, find and connect to each other automatically in the same network. ElasticSearch configuration files are located in the
ElasticSearchHomeDirectory/config folder. The corresponding row in the
elasticsearch.yml file must be arranged for the cluster name.

Client support is available for many platforms such as Java, PHP, Python, Perl, Ruby, and .NET. Check out the full list here:
Clients.

Conclusion

ElasticSearch is quite elastic among its peers in terms of both configuration and usage, and it is also an attractive option for systems working with big data that may result in I/O bottlenecks because of search operations and data analysis.

I hope to handle some issues in future articles such as basic CRUD operations with ElasticSearch, provided Java API, and the usage of this useful tool in a web project.