Email a friend

To

From

Thank you

Sorry

Microsoft's fond enough of Hadoop to give the platform a special place in its Azure cloud as HDInsight -- and fond enough, it seems, to keep HDInsight updated with the latest stable release of Hadoop, version 2.4. This time around, the update includes changes contributed back to key Hadoop subprojects -- contributions Microsoft is claiming have been provided by way of technological advancements from one of its proprietary software offerings.

HDInsight 2.4 features contributions Microsoft made to two key Hadoop subprojects, Hive and Stinger. Hive, an Apache Software Foundation project for Hadoop, is a data warehousing system that allows data to be kept in Hadoop's distributed storage system and queried with a SQL-like syntax. Stinger is a project created by Hortonworks, the Hadoop vendor that Microsoft has worked with for HDInsight and other projects, and is designed to succeed and expand upon Hive by making it orders of magnitude faster.

The real kicker lies in the way Microsoft described these contributions in a post on its SQL Server blog: "This update [to HDInsight] includes interactive querying with Hive using advancements based on SQL Server technology, which we are also contributing back to the Hadoop ecosystem through project Stinger."

Microsoft is claiming this update will give HDInsight users a hundred-fold increase in performance, but the offhand note about SQL Server's tech is most intriguing. It probably doesn't imply Microsoft plans to open source SQL Server or any of its key technologies. Rather, it's more a tacit admission that Microsoft believes it understands this territory well and can bank on that for the sake of its customers.

When Microsoft's ventures into Hadoop-land were still new, back in 2012, O'Reilly Radar analyst Ed Dumbill noted how Microsoft's strategy was customer- and developer-centric: "By embracing Hadoop," he wrote, "Microsoft allows its customers to access the rapidly growing Hadoop ecosystem and take advantage of a growing talent pool of Hadoop-savvy developers." Hence, the JavaScript layer Microsoft added to Hadoop to make it easier to program; hence the added connectivity between Hadoop and Microsoft products like Excel and SQL Server; and hence the growing ease with which Microsoft contributes to the Hadoop ecosystem, either directly on its own or through partners like Hortonworks. In the long run, it's also about keeping Microsoft's own proprietary ecosystem well fed.

Microsoft's work with Hadoop is at least as much about making Microsoft's own products more useful and appealing to the broadest possible market. As long as products like Excel or Power BI are on the front end, it's less important for Microsoft if Hadoop's on the back end.