Wednesday, October 07, 2015

SQL evolves to U-SQL, language under development at Microsoft

While SQL covered the RDBMS landscape U-SQL covers a much larger data landscape.

At the same time as the announcement of Azure Data Lake Services, a new language under development at Microsoft, the U-SQL language was also announced. For the Azure Data Lake Service and what it means to business read here.

With the advent of Big Data and the task of mining all kinds of data, RDBMS suddenly found itself at a disadvantage. Structured Query Language (SQL) could only address what is in a relational data store. U-SQL was born to address this challenge posed by Big Data defined by volume, velocity and variety.

What is U-SQL
U-SQL deep dives into Big Data to extract the most relevant information. It is a powerful language (in the words of Microsoft):

Process any type of data. From analyzing BotNet attack patterns from security logs to extracting features from images and videos for machine learning, the language needs to enable you to work on any data.

Use custom code easily to express your complex, often proprietary business algorithms. The example scenarios above may all require custom processing that is often not easily expressed in standard query languages, ranging from user defined functions, to custom input and output formats.

Scale efficiently to any size of data without you focusing on scale-out topologies, plumbing code, or limitations of a specific distributed infrastructure.

Compared to HIVE, a SQL-Based language U-SQL is flexible and does not have the limited capability to address the 'variety' in non-structured data requiring schema generation prior to running queries. U-SQL should prove more easy to use than Hive for complex scenarios.

U-SQL has been designed as declarative SQL based language with native extensibility through user code in C#. This approach:

Unifies SQL and C#

Unifies structured and Unstructured

Unifies declarative and custom code

U-SQL is based on SCOPE which is based on existing prior languages, ANSI-SQL, T-SQL and HIVE. U-SQL should present a less steeper curve for those who are using SQL already.