AWS Cloudsearch

The cloud is filled with wide ranging options to store and retrieve data and so is on-premise. Every cloud provider has their own cloudsearch solution from Amazon to Azure to Google. In addition to these proprietary solutions, there are these open source platforms ElasticSearch, Apache Solr etc.

In short, all offer similar features with little difference. I would say there are mainly two big differences between AWS CloudSearch and the other two.

Data import is a batch process in AWS CloudSearch. If you have streaming data or immediate data update, the go for elastic or Solr.

If you don’t need to worry about infrastructure, backups, patches, then go with AWS Cloudsearch. Out-of-the box it comes out as a true cloud product.

Elastic.co as well as AWS offers elastic search as a service where they have simplified the infrastructure part. Elastic.co, infact offers it as a service on AWS cloud. However, Elastic and Solr are more popular than CloudSearch. Thus, it is easy to find resources online for these two compared to AWS CloudSearch.

Thus, I embarked on a journey to take-up AWS CloudSearch and you know what, it is not that difficult (though I went through those gnawing issues and had my own share of frustrating moments). To begin with, I did the manual route of extracting the data out of my RDBMS (SQL Server), upload the data to CloudSearch, indexed it and used the rudimentary UI provided by AWS and was able to search in an hour. The biggest advantage, I see with AWS CloudSearch data upload is that it takes a CSV file and converts to JSON by itself. You can write a batch program to upload in chunks of 5MB files. In addition to CSV, it support multiple other types such as PDF, EXCEL, PPTX, docx etc.

Both Solr and Elastic search, you need to provision a Linux server, then install and configure similar to any software that you download. Even if you take the service route, you still need to worry about backup, upgrades, applying patches etc. One big advantage of these is that you can have it on-premise as well, while AWS CloudSearch is truly available only on the AWS cloud. Beyond that, Elastic also has data visualization tool Kibana or it comes like a suite (ELK – Elastic, Logstash, Kibana). AWS ColudSearch offers only indexing and search and no visualization which is a separate product Quicksight (I haven’t looked at this but I plan to).

I will write more about the programmatic approach in my next entry. Please drop me a line, if you can’t wait and wish to see it in action!