Super hot off the press- the Early Release of my book Mastering Azure Analytics: Architecting in the cloud with Azure Data Lake, HDInsight and Spark is now available! You can get the first two chapters right away and new chapters are right around corner. Here's what I cover: Microsoft Azure has over 20 platform-as-a-service (PaaS) offerings that can act in support of a big data analytics solution. So which one is right for your project? This practical book helps you understand the breadth of Azure services by organizing them into a reference framework you can use when crafting your own...
Read more →

If you are getting an exception when running an Azure Data Lakes Analytics U-SQL query against a CSV source file, you may get an inscrutable error like the following: ERROR VertexFailedFast. Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][0] with error: Vertex user code error. DESCRIPTION Vertex failed with a fail-fast error RESOLUTION DETAILS Vertex SV1_Extract[0][0].v2 {43B20D9E-E63F-48AF-8E9A-FFFAE288FCB8} failed Error: Vertex user code error exitcode=CsExitCode_StillActive Errorsnippet=An error occurred while processing adl://adlmvp.azuredatalakestore.net/Zoiner/ExtentAligned/On_Time_On_Time_Performance_2014_1_OneColumn.csv In my experience so far, the root causes you can check for that trigger this error include: The columns in your U-SQL query don't line up with the number...
Read more →

I commonly get the question, "what's the difference between Event Hubs and IoT Hubs"? Both are basically positioned at the edge of a cloud analytics solution, and after all are responsible for storing ingested telemetry as messages or events. While you can get very deep answering this question, there are three high-level fundamental differences that can help you in making the right selection. #1 Messaging Directions Event Hubs provides what I like to call a "multi-consumer" queue that defers state management responsibility (e.g., progress reading thru the queue) to the consumer. This type of queue is great for ingesting huge...
Read more →

Last year, Microsoft released the Visio template you can use to create really cool looking 3D perspective illustrations of Azure topologies. These templates are visually interesting and also super helpful in getting high-level points across to an introductory audience. To that aim, I felt there was a illustration missing in the official blueprints provided by Microsoft and so I created my own for IoT. It explores some of the options for creating a lambda architecture on Azure that can be used to process the telemetry produced by IoT devices. You can download a PDF of my template here: Download IoT...
Read more →

If you're migrating from HDInsight on Windows to HDInsight on Linux, you are probably upgrading to Spark 1.6 from Spark 1.3 and shifting from Zeppelin to Jupyter. This turns out to have some pretty fundamental changes. If you, like me, get cross-eyed trying navigate the Scala docs, I'll be creating a few posts about key changes with examples of the new syntax. In this post, we'll examine how to save a DataFrame to permanent table, that could be queryable via Hive or via external tools. In the previous version of Spark, you could have simply called saveAsTable against a DataFrame...
Read more →

If you're migrating from HDInsight on Windows to HDInsight on Linux, you are probably upgrading to Spark 1.6 from Spark 1.3 and shifting from Zeppelin to Jupyter. This turns out to have some pretty fundamental changes. If you, like me, get cross-eyed trying navigate the Scala docs, I'll be creating a few posts about key changes with examples of the new syntax. In this post, we'll begin at the beginning- loading a CSV text file. In the previous version of HDI you would have loaded a text file using the following syntax: val textLines = sparkContext.textFile("wasb:///subfolder/myfile.csv") In Spark 1.6 the...
Read more →

When most folks think of applying Azure Search, they are thinking about the traditional text search scenario such as find documents that contain the text "run" and it will match (due to its support for Natural Language Processing and linguistic stemming) documents that contain "run" and "running". While it's true Azure search does a great job supporting full text search, I encourage thinking about its application with a broader lens- as the external index to another data store. For example, Azure Table Storage has long been bemoaned as not having support for secondary indexes (it only supports a single composite...
Read more →

If you're coding the bleeding edge, you may find yourself in situation where the version of the App Services SDK you want has not been officially released yet, but it's there on GitHub. To give you an example, I needed to get at the still burgeoning support for EventProcessorHost in Web Jobs via the Microsoft.Azure.WebJobs.ServiceBus assembly. How do you get these packages into your solution? The process to getting at the "nightly" builds turned out to be surprisingly simple. Within Visual Studio, right click on your solution and choose Manage NuGet Packages for Solution. In the top, right corner of...
Read more →

While it might be obvious to some, I still frequently find clients confused about the type of messaging Azure Notification Hubs provides. In this post, I thought I would tackle that by clarifying what it does not provide. First of all, like the title suggests, Azure Notification Hubs does not itself provide SMS text messaging to mobile devices. What it does provide is in-app, push notifications. In other words, your recipients need to have your mobile app installed on their device before they can receive notifications. There is often confusion on this with because the SMS and push notifications are...
Read more →

For those of you familiar with launching HDInsight with Spark clusters on Windows (which was the only option), you may be surprised to find that this is no longer an option. Here is what you used to see: With the latest update linux is the new black, and its your only option for running Spark with HDInsight. To be clear, various Hadoop options on Windows are still available with HDInsight, just not Spark. This change comes with it some other ramifications if you are coming from the HDInsight + Spark on Windows. The most notable is the notebook (sorry, I...
Read more →