Parsing Akamai logs using Azure HD Insight Spark Cluster.

You have seen many videos on Hadoop/Spark cluster, where a ubiquitous example for map reduce is used of counting the words "Banana" from a clean text files. But, in real-life your log files are not this clean, and they are not on cluster itself. Clusters are expensive affairs, so how do you programmatically create cluster and automate your processing?

Here is a presentation about developing a real-life application using Spark cluster. In this presentation, we will parse Akamai logs kept on an Azure storage. We will introduce some of the tools available. After having the script run in Jupyter notebook, we will automate the solution which can be started by a call to an endpoint.