Bigger Data Comes to Windows

The next step in integrating Windows and big data was taken today, continuing Microsoft's push into expanded business intelligence capabilities.

Hortonworks Data Platform (HDP) for Windows was commercially
released today, offering another way to run Apache Hadoop "big data"
workloads.

The product previously was at the beta stage back in February. While many Apache
Hadoop workloads run on Linux servers, HDP for Windows offers native
support for both Linux servers and Windows Server, with "a common user
experience," according to Hortonworks' announcement. Moreover, Hortonworks
claims that its platform is 100 percent open source, which isn't the case
with some Hadoop implementations.

Microsoft collaborated with Hortonworks on the HDP for Windows product.
Hadoop is an Apache open source project, largely fostered by Yahoo, with
some of the Yahoo Hadoop team members later joining Hortonworks. So
Microsoft's collaboration with Hortonworks will add support for
organizations running Hadoop in mixed computing environments.

The collaboration also paves the way for Microsoft's big data business
intelligence tools. Microsoft PowerPivot for Excel and Power View for
SharePoint Services can both be used to display Hadoop query results.
Hadoop is an open source framework for MapReduce, which supports scale-out
data processing across clusters using piles of unstructured and structured
data, allowing ad hoc queries to be run. So, in theory, Microsoft will
make it easier to graph such data and gain insights. Microsoft worked with
the Apache Software Foundation on the open database connectivity driver
for Hive, Hadoop's data warehouse system, to build support for its
business intelligence tools.

There's also System Center integration effort with Apache Ambari, which
enables System Center to manage Hadoop clusters alongside other computing
assets. The Web-based Ambari tool is used to install, monitor and manage
Apache Hadoop clusters.

Microsoft is also touting the ability of HDP for Windows to work with its
own Windows Azure HDInsight Service. Supposedly, users of HDP for Windows
can "migrate seamlessly" to Microsoft's cloud-based Windows Azure Hadoop
implementation. Microsoft also has its own Hadoop implementation for
Windows, which is called "Microsoft HDInsight Server for Windows."

Microsoft's Windows Azure HDInsight Service is currently at beta.
Possibly, it could be released this summer, according to a recent talk by expert Andrew Brust. He noted that
Microsoft still needs to do some work with HDInsight to get the tooling up
to speed for enterprise use.

Hortonworks' HDP for Windows 1.1 product can be downloaded at this page. It contains Hadoop components such as
Pig, Hive and Sqoop, among others. HDP for Windows 1.1 runs on Windows
Server 2008 or Windows Server 2012.

Featured

This week saw two third-party vendors of dev tools -- UX and UI toolkits and controls -- release new offerings that include support for two of Microsoft's main open source frameworks, the cross-platform .NET Core 3.1 and Blazor, which allows for creating browser-based web applications with C# instead of JavaScript.

Clustering non-numeric -- or categorial -- data is surprisingly difficult, but it's explained here by resident data scientist Dr. James McCaffrey of Microsoft Research, who provides all the code you need for a complete system using an algorithm based on a metric called category utility (CU), a measure how much information you gain by clustering.