Wednesday, June 27, 2012

• Updated 6/30/2012 with status of the live data source for Codename “Social Analytics.”

MyBig Data in the Cloud article for Visual Studio Magazine’s July 2012 issue asserts “Microsoft has cooked up a feast of value-added big data cloud apps featuring Apache Hadoop, MapReduce, Hive and Pig, as well as free apps and utilities for numerical analysis, publishing data sets, data encryption, and uploading files to SQL Azure and blobs.” Here’s the introduction:

Competition is heating up for Platform as a Service (PaaS) providers such as Microsoft Windows Azure, Google App Engine, VMware Cloud Foundry and Salesforce.com Heroku, but cutting compute and storage charges no longer increases PaaS market share. So traditional Infrastructure as a Service (IaaS) vendors, led by Amazon Web Services (AWS) LLC, are encroaching on PaaS providers by adding new features to abstract cloud computing functions that formerly required provisioning by users. For example, AWS introduced Elastic MapReduce (EMR) with Apache Hive for big data analytics in April 2009. In October 2009, Amazon added a Relational Database Services (RDS) beta to its bag of cloud tricks to compete with SQL Azure.

Note: My The battle for cloud services: Microsoft vs. Amazon article of 6/18/2012 for TechTarget’s SearchCloudApplications.com site compares the feature of the two cloud giants’ approaches to implementing Hadoop, MapReduce and Hive in Windows Azure and Amazon Web Services.

Microsoft finally countered with a multipronged Apache Hadoop on Windows Azure preview in December 2011, aided by Hadoop consultants from Hortonworks Inc., a Yahoo! Inc. spin-off. Microsoft also intends to enter the highly competitive IaaS market; a breakout session at the Microsoft Worldwide Partner Conference 2012 will unveil Windows Azure IaaS for hybrid and public clouds. In late 2011, Microsoft began leveraging its technical depth in business intelligence (BI) and data management with free previews of a wide variety of value-added Software as a Service (SaaS) add-ins for Windows Azure and SQL Azure (see Table 1).

Codename

Description

Link to Tutorial

“Social Analytics”

Summarizes big data from millions of tweets and other unstructured social data provided by the “Social Analytics” Team

Table 1. The SQL Azure Labs team and the StreamInsight unit have published no-charge previews of several experimental SaaS apps and utilities for Windows Azure and SQL Azure. The Labs team characterizes these offerings as "concept ideas and prototypes," and states that they are "experiments with no current plans to be included in a product and are not production quality."

In this article, I'll describe how the Microsoft Hadoop on Windows Azure project eases big data analytics for data-oriented developers and provide brief summaries of free SaaS previews that aid developers in deploying their apps to public and private clouds. (Only a couple require a fee for the Windows Azure resources they consume.) I'll also include instructions for obtaining invitations for the previews, as well as links to tutorials and source code for some of them. These SaaS previews demonstrate to independent software vendors (ISVs) the ease of migrating conventional, earth-bound apps to SaaS in the Windows Azure cloud.

• This article went to press before the Windows Azure Team’s Meet Azure event on 6/7/2012, where the team unveiled the “Spring Wave” of new features, upgrades and updates to Windows Azure, including Windows Azure Virtual Machines, Virtual Networks, Web Sites and other new and exciting services. Also, the team terminated Codename “Data Transfer” project on 6/2/2012 and Codename “Social Analytics” lab as of 6/21/2012. However, as of 6/27/2012, the “Social Analytics” data stream from the Windows Azure Marketplace Data Market was still operational, so the downloadable C# code for the Microsoft Codename “Social Analytics” Windows Form Client still works.As of 6/30/2012, the data from the Windows Azure Marketplace Data Market was no longer available, so I’ll modify the sample code to use the data I generated (see below note). Stay tuned for availability of the bits.

Note: I modified my working version of the project to copy the data from about a million rows in the DataGridView to a DataGrid.csv file, which can be loaded on demand. Copies of this file and the associated source file for the client’s chart are available from my SkyDrive account. I will update the sample code to use the DataGrid.csv file if the Data Market steam becomes unavailable.

The dual Web role application has been running in Microsoft's South Central US (San Antonio) data center since September 2009. I believe it is the oldest continuously running Windows Azure application.

About Me

I'm a Windows Azure Insider, a retired Windows Azure MVP, the principal developer for OakLeaf Systems and the author of 30+ books on Microsoft software. The books have more than 1.25 million English copies in print and have been translated into 20+ languages.

Full disclosure: I make part of my livelihood by writing about Microsoft products in books and for magazines. I regularly receive free evaluation software from Microsoft and press credentials for Microsoft Tech•Ed and PDC. I'm also a member of the Microsoft Partner Network.