Thursday, June 11, 2015

Teradata is gearing up embracing Presto

Teradata has joined the open source Presto community. Presto is the distributed SQL query engine used by big data innovators/users like Facebook, Netflix and others. Presto is almost a universal SQL query engine that supports any Hadoop distribution.

Justin Borgman's post cites Facebook as the heavy user with impressive usage statistics. Presto is not only SQL for querying Hadoop but querying all kinds of data platforms (MySQL, Postgres, Kafka, and Cassandra) in data lake* including mash ups of data from diverse sources.

Teradata has a multiyear roadmap to make it enterprise ready and it should be ready in 2016, that's not so far away either. Of course Teradata will be looking forward to contributions from the open source community.

TeradataPresto2
SQL Server 2016 with its Polybase can also query Hadoop with T-SQL and is very flexible. It can use the splendid BI capabilities built-in to SQL Server 2016 and is ready now. Besides Microsoft there are many others as well to take Big Data, and slice and dice.

*data lake: A massive, easily accessible data repository built on (relatively) inexpensive computer hardware for storing "big data". Unlike data marts, which are optimized for data analysis by storing only some attributes and dropping data below the level aggregation, a data lake is designed to retain all attributes, especially so when you do not yet know what the scope of data or its use will be.