ElasticSearch: Was Google Right about Simplicity?

November 13, 2012

When the Google Search Appliance became available nine or 10 years ago, I was the victim of a Google briefing. The eager Googler showed me the functions of the original Google Search Appliance. I was not impressed. As I wrote in the Google Legacy, the GSA was a “good start” and showed promise.

But one thing jumped out at me. Google’s product planners had identified the key weakness or maybe “flaw” in most of the enterpriser search solutions available a decade ago—Complexity. No single Googler could install Autonomy, Endeca, Fast Search & Transfer, or Convera without help from the company. Once the system was up and running, not even a Googler could tune the system, perform reliable hit boosting, or troubleshoot indexers which could not update. Not surprisingly, most of the flagship enterprise search systems ran up big bills for the licensees. One vendor went down in flames because there were not enough engineers to keep the paying customers happy. So ended an era of complexity with the Google Search Appliance.

I may have been wrong.

I just read “Indexing BigData with ElasticSearch.” If you are not familiar with ElasticSearch (formerly Compass), think about the Compass search engine and the dozens of companies surfing on Lucene/Solr to get in the search game. Even IBM uses Lucene/Solr to slash development costs and free up expensive engineers for more value added work like the wrappers that allow Watson to win a TV game show. I have completed for IDC an analysis of 13 open source search vendors and some of these profiles are available for only $3,500 each. See http://www.idc.com/getdoc.jsp?containerId=236511 for an example.

Is your search system as easy to learn to ride as a Big Wheel toy? If not, there may be some scrapes and risks ahead. In today’s business climate, who wants to incur additional risks or costs in a pursuit of a short cut only a developer can appreciate. Not me or the CFOs I know. A happy quack to http://www.bigwheeltricycle.net/ for this image.

The write up explains how to perform Big Data indexing with ElasticSearch. I urge you to read the write up. Consider this key passage:

The solution finally appeared in the name of ElasticSearch, an open-source Java based full text indexing system, based on the also open-source Apache Lucene engine, that allows you to query and explore your data set as you collect it. It was the ideal solution for us, as doing BigData analysis requires a distributed architecture.

Sounds good. With a fresh $10 million ElasticSearch seems poised to revolutionize the world of enterprise search, big data, and probably business intelligence, search based applications, and unified information access. Why not? Most open source vendors exercise considerable license in an effort to differentiate themselves from next generation solutions such as CyberTap, Digital Reasoning, and others pushing the envelope of findability technology.

The impression the write up made upon me was that Google’s decade old claim that enterpriser search was way too complicated may have been correct. Now the Google Search Appliance is no easy thing to get working. But compared to the hoops the BugSense people went through, the GSA 7007 and GSA 9009 are Big Wheel children’s toys. And Big Data? Who really knows what that means in today’s world of automated CART functions, recursive algorithms, and relaxed Bayesian analyses.

If I had to score the complexity of the ElasticSearch implementation based on this single write up, I would score Google an A minus. ElasticSearch would garner from me a D minus. If you did not have a raft of code wizards on hand, ElasticSearch might be kept after class.

You can buy the IDC analyses of 13 open source search vendors and figure out which ones are the systems I recommend as “enterprise ready” with regards of productization, support, full time staff, and high value extensions. If you want a short cut and enjoy simplicity, go with the GOOG. If you like computer science projects, consider such options as ElasticSearch or the little known Summa system. Just my opinion and it pains me to say, “Google was right. Most systems are far too complex for licensees.” Imagine that. Simplicity, support, brand, and engineering excellence are more important than complexity.

Search the site

Stephen E. Arnold monitors search, content processing, text mining
and related topics from his high-tech nerve center in rural Kentucky.
He tries to winnow the goose feathers from the giblets. He works with colleagues
worldwide to make this Web log useful to those who want to go
"beyond search". Contact him at sa [at] arnoldit.com. His Web site
with additional information about search is arnoldit.com.