Elasticsearch: One Tool That’s Never Stretched Too thin

Insight. It’s the whole reason why we analyze all the data we’re collecting and storing. Yet insight doesn’t magically leap out of data – it’s usually teased about by someone posing a question and then searching for the answer. This is why search, especially fast, full-text search is such a powerful technology.

Remember, it was Google’s advanced search technology which allowed it to outperform its rivals, generating the revenue which built its empire, still creating the cash which is expanding its domain. Consider that Google, which started as an Internet search company, has made investments in self-driving cars rivaling that of actual car manufacturers.

The power of search puts the cumulative knowledge of the global Internet at your fingertips. We’ll take a look at Elasticsearch, which can do the same for the insights lurking in your company’s data not only by scooping up the specific data you’re looking for but also by providing easy on-ramps to anomaly detection.

What’s Elasticsearch?

Elasticsearch is smart data storage, full-text search and analytics rolled into one. It’s “smart” storage because it’s distributed, highly indexed, and document-based. Elasticsearch stores data, but it isn’t like filesystems, which are schemes for storing files on raw hardware. Elasticsearch’s job is at least one level of abstraction higher: the very quick retrieval and analysis of the textual information stored in those files.

Elasticsearch also isn’t a shared folder or simple bit bucket: every field in every document is thoroughly indexed. Besides, shared folders merely serve to make files accessible to many users over a network. Elasticsearch, on the other hand, makes answers much more accessible by performing full-text search to return exact (and close) matches to the search query.

Fast, full-text search is great on its own but can be a game changer when scaled to large amounts of data, which is where Elasticsearch really shines: when a single instance is deployed across multiple nodes.

The power of combining storage and search

The speed and high availability provided by Elasticsearch (especially in a cluster) relies on distributing both the data and the analytics workload in ways more traditional text search tools can’t because they’re written to trawl through text serially.Elasticsearch makes data storage and full-text search work at massive scales and speeds, and thus is a prime example of the kind of tool which makes “Big Data” a reality. Generating, gathering and storing data at the scale of petabytes is really only the first step: finding relevant information in that data through search is a whole other challenge.

Elasticsearch solves a tricky paradox brought on by big data: the more data you have, the more useful it is because the statistics are better quality (especially if your analysis is really granular and detailed), but combing through more data takes more time. Thus, large-scale analytics has previously required batch jobs which could take hours to run across an enterprise.

How does it work?

Elasticsearch abstracts away the intricacies of distributed data storage, making it easy to use and a snap for developers to integrate with their other tools.Elasticsearch is a standalone search engine built with the high-performance Apache Lucene library. Lucene is written in Java, and thus so is Elasticsearch, which abstracts away Lucene’s complexities behind a RESTful API and runs as a server process. Developers communicate with an Elasticsearch instance via a web client written in the language of their choice, giving developers maximum flexibility.

Flexibility in a powerful tool, especially when it comes anomaly detection, one of the key ways to extract important insights from your mountains of data.

Elasticsearch and anomaly detection

Extracting those insights is made a lot easier because of Elasticsearch Aggregations, which make it easy to generate statistics about your data. These stats can be very useful, especially at the speed Elasticsearch is able to retrieve them. Having real time metrics like average social media mentions per day, or number of desk lamps bought online per geographic region can be very useful if fed into other tools.

Case in point: real-time anomaly detection vendor Anodot that uses Elasticsearch as an important part of their SaaS product, mainly for grouping together related anomalies in order to produce concise alerts. This concise reporting of anomalies is a competitive edge for Anodot since it eliminates the problem of alert storms (common in competing products) for the analysts using the system. Anodot also uses Elasticsearch internally to catch anomalies created by glitches in its algorithms, effectively performing meta-anomaly detection.

Machine learning powers Anodot’s anomaly detection system. Elasticsearch anomaly detection is possible since it does have its own machine learning capabilities (provided by X-Pack). Since, however, Anodot provides multivariate anomaly detection and grouping, as well as advanced algorithms which scale well to millions of metrics, (like its Vivaldi algorithm for detecting seasonality which combines the accuracy of autocorrelation with the speed and low computational cost of Fourier Transform) it makes a lot of sense for Anodot’s customers who are also using Elasticsearch to just send the data over to Anodot.

With its speed, resiliency, powerful built-in full-text search, and ease of integration, Elasticsearch is a popular choice for the architects of tech’s leading software.

About the author

Andy Robert

Andy Robert provides helpful information and assets for those looking for high quality tech services. Their mission is to provide the authentic and determined information so you can make an informed decision on cloud.