Getting started with Druid: A high-performance, column-oriented, distributed data store.

The goal of this post is 2 folds: Being able to work with arbitrary data generated from http://www.json-generator.com/. Ingesting it into Druid. Performing filtering and aggregation on the data. We'll start with timeseries data and then try and look for ways to work with non-timeseries data. Generating data I used the following json spec to generate data from json-generator: Ingestion We generated a sample data from json-generator.com Now we need to generate the index file, which is like a metadata document that druid uses to ingest the files. The ingestion spec is a JSON document of the following structure: The

Crossfilter and dc.js are awesome libraries. They are pretty neatly coupled and let you create interactive visualizations for the web. The problem arises when you're dealing with massive data. Memory limits of the browser is hit, the filtering becomes too slow and it becomes impossible to handle such datasets. What we can do, is move the loading of the data and the computationally intensive filtering, on the server. So you have your charts rendered using dc.js on the browser, everytime a user interacts with the charts we send an AJAX request to the server to compute the filter, the results are