Microsoft is working with Hadoop core committers from Hortonworks to bring the …

Share this story

At Microsoft's PASS Summit in Seattle today, Microsoft Corporate Vice President Ted Kumert outlined the company's strategy for tackling big data within and outside the enterprise. And a big part of those plans includes wiring SQL Server 2012 (formerly known by the codename “Denali”) to the Hadoop distributed computing platform, and bringing Hadoop to Windows Server and Azure. “The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago,” Kummert said in his keynote. SQL Server 2012 will ship in the first half of next year.

The addition of Hadoop-based data analysis will allow Microsoft customers to process and analyze large quantities of unstructured data, both within and outside the enterprise. It's also part of Microsoft's larger strategy with its Windows Azure Marketplace service, and an effort to create a “data ecosystem” where companies can access data streams of various types from a public cloud, analyze and transform them, combine them with their own data, and share or sell their own data sets as well.

In an interview with Ars Technica, Microsoft SQL Server general manager Doug Leland said that Microsoft has formed a strategic partnership with Hortonworks, a Hadoop support consultancy formed by some of Hadoop's core developers from Yahoo, to help implement the platform on Windows Server and in the Azure cloud. “They're helping us accelerate development,” Leland said, “and providing technical expertise to Microsoft to help us deliver our Hadoop distributions on Windows Server and Windows Azure.”

Leland also said that Microsoft will submit any additions or changes it makes to Hadoop to Apache, and that the company is “making a commitment to compatibility” with the open-source project.

Part of Microsoft's Hadoop efforts include an ODBC driver for Hive, the Hadoop query engine, which will provide “direct realtime querying from business intelligence tools into Hadoop,” Leland said. The Azure Hadoop service and the Hive ODBC driver will be available in preview before the end of 2011, with a community technology preview for Hadoop on Windows Server arriving next year.

At The Data Warehousing Institute conference in August, Microsoft had previewed a Hadoop-to-SQL Server connector based on the Sqoop SQL-to-Hadoop import and export tool. Kumert announced today that the connector is now officially released. The Microsoft SQL Server Connector for Apache Hadoop is compatible with SQL Server 2008, and will allow delimited text files and sequence files in the Hadoop Distributed File System, as well as Hive tables, to be imported into a SQL Server database. The connector can be used to run MapReduce queries against Hadoop HDFS data and Hive queries, and then pull the resulting data sets into Microsoft SQL server for further analysis with relational tools.

Cloudy business intelligence

Kummert also showed off two business intelligence tools tied to SQL Server 2011. The first was a prototype of a project codenamed Microsoft Data Explorer, a set of tools that will allow users to pull data from spreadsheets, SQL databases, other files, and the Windows Azure Marketplace to create reports that can be published and shared within the organization, and pushed back out to the Azure's data marketplace. The reports will also be able to be pulled into PowerPivot or Excel for further analysis.

The other tool highlighted during Kummert's keynote was Power View, previously known by the codename of “Project Crescent.” Power View, demoed by Microsoft at its Worldwide Developer Conference earlier this year, is a Web-based ad-hoc business intelligence tool that will be included with SQL Server 2012, and ties into Windows Azure Marketplace. Microsoft announced today that it will support touch-based interfaces, such as those provided with Windows 8, to allow users to manipulate data sets by touch. A version of Power View with the touch interface capabilities will be available by the end of next year.

Share this story

Sean Gallagher
Sean is Ars Technica's IT and National Security Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland. Emailsean.gallagher@arstechnica.com//Twitter@thepacketrat