Polybase is a technology that is designed to query data from Hadoop Distributed File System. It works by integrating SQL Server Parallel Data Warehouse and supports SQL queries to read and write data from HDFS.

In simple, we can create an external PDW table that reference Hadoop data and then write SQL statement to query data from HDFS. With this approach it provides capabilities to get data by joining native relational tables and Hadoop data.

Let’s have a look on the benefits of using PolyBase:

No need to understand Map Reduce that is mostly written in Java technologies. Write simple queries to process and load data from HDFS.

We can connect it (PWD) with Visual Studio SQL server object Explorer and from SQL Server 2014.

It has capabilities to fetch data directly from HDFS and bypass MapReduce (as most of Hadoop’s components like Hive, Sqoop instead use MapReduce to extract data). So PolyBase works in cost effective manners due to its intelligence to decide when and where to use MapReduce and when to directly access HDFS.

It can be integrated with BI tools like Power Pivot, Power View, and SSRS etc.

No extra burden to learn Hadoop, MapReduce or Java technology. As most of the developers already know SQL queries and can work easily with it.Syntax examples: