Microsoft Readies Major Push Into Big Data

Alex Woodie

Microsoft has a lot of irons in the fire. Always has and always will. But judging from its recent acquisition of Revolution Analytics, the early success of its hosted machine learning service, and the forthcoming public launch of a MapReduce analog called “Cosmos,” the Redmond, Washington software giant is set to make big data an even bigger part of its go-to-market strategy.

Microsoft is reportedly gearing up to publicly launch a new big data storage and crunching service called Cosmos, which is currently an internal service only available to Microsoft divisions. The distributed system, which runs on the Azure cloud, is used to process petabyte’s worth of data generated by MSM, Bing, and other Microsoft properties, and can scale upwards of 40,000 nodes.

According to a 2011 Microsoft Research paper, Cosmos uses a massively parallel processing framework based on Dryad that works in a manner similar to MapReduce. Microsoft exposes a SQL-like language called SCOPE (Structured Computation Optimized for Parallel Execution) for Cosmos, which simplifies programming for 5,000 internal engineers.

Massive amounts of data are run through Cosmos every day, letting Microsoft know what people are searching for, what sites people are visiting, and what ads people are clicking on. It’s similar in many respects to the types of jobs that Yahoo developed Hadoop to run, but it’s all Microsoft technology.

Now it appears Cosmos will be exposed to the public. According to longtime Microsoft watcher Mary Jo Foley, Microsoft is planning to offer an externally facing version of Cosmos that would complement HDInsight, the Hadoop service that Microsoft also offers on Azure.

Microsoft is planning to launch a series of tools to Comos, Foley says, including an analysis-engine component codenamed “Kona,” a storage-engine piece codenamed “Cabo,” and a version of SCOPE called SQL-IP. Together, the group of hosted tools could provide competition for Google’s Dataflow, the service launched in 2014 to help people develop and run big data pipelines.

All of a sudden, it appears that Microsoft is all over big data and data science. Its acquisition of Revolution Analytics two weeks gave the company a big boost as a provider of R tools and services. Revolution Analytics had developed a way to parallelize R code and run it at massive scale in Hadoop, Netezza, and other large-scale repositories for data analytics. Microsoft will clearly benefit from adding Revolution’s R technology into its service.

Microsoft also appears well poised to capitalize on the burgeoning interest in machine learning and predictive analytics with Azure ML, the hosted machine learning environment it launched last summer. The offering boasts a number of pre-built templates and visual workloads that are is designed to make it easier for users to take advantage of machine learning, and start generating insights from massive amounts of data across healthcare, financial services, transportation, and retail industries.

Say what you want about Microsoft and its (many) business strategies. You may not use any of Microsoft’s many products, including Windows 8, Bing, SQL Server, Windows Phone, Xbox, or Dynamics ERP. The days of an unstoppable monopoly dictating terms to an entire industry are long over, and Microsoft is not a big name in big data, but based on its recent moves, that could be changing.