Evolving from #RDBMS to #NoSQL + #SQL

Posted on May 3rd, 2016

Jim Scott @MAPR spoke about #ApacheDrill which has a query language that extends ANSI SQL. Drill provides an interface that uses this SQL-extension to access data in underlying db’s that are SQL, noSQL, csv, etc.

The Ojai API has the following advantages

Gson (in #Java) uses two lines of code to serialize #JSON to place into the data. One line to deserialize

Idempotent – so don’t need to worry about replaying actions things twice if there is an issue.

Drill does not requires Java, but not Hadoop so it can run on a desktop

Schema on the fly – will take different data formats and join them together: e.g. csv + JSON

Data is directly access from the underlying databases without needing to first transform them to a metastore

Security – plugs into authentication mechanism of the underlying dbs. Mechanisms can go through multiple chains of ownership. Security can be done on row level and column level.

Commands extend SQL to allow access lists in a JSON structure

CONTAINS

SUM

Can create views to output to parquet, csv, json formats

FLATTEN – explode an array in a JSON structure to display as multiple rows with all other fields duplicated