Category: Apache Hadoop

Yesterday I gave a talk in the CopenhagenR – useR Group on how to leverage the power of R with HDInsight and Microsoft Azure. I recorded the presentation, which was spiced up by some really good questions by the attendees. I hope it’s all audible.

For any questions on the topic, please contact me via the comments below this post.

Not more than a week after the 50-year anniversary of Moore’s Law, the 2015 edition of Global Azure Bootcamp is taking place at more than 183 locations around the world. Speak about global computing at scale! It’s a truly amazing effort driven by the local Azure communities across the planet — and I have been fortunate enough to participate in the Copenhagen camp.

Ever since I started grad school in Madrid, I have been very interested in statistics and quantitative methods in general. With Microsoft I have dived even more into the world of big data and machine learning — topics on which I have given a ton of presentations ever since I started as a Tech Evangelist.

At the Global Azure Bootcamp I gave a presentation on how to build your own recommendation engine by using Hadoop in Azure through the service named HDInsight. The case is to build a song recommender that can suggest new songs to users based on the songs they have listened to in the past and the songs that other users have listened to (i.e., user-based collaborative filtering). Essentially, the case is to build what Spotify, YouTube and other popular media services are doing.

I recorded the presentation and uploaded it to YouTube. Check it out below. Also, check out the PowerPoint presentation below the YouTube video if your are particularly interested in the slides used in the talk. If you have any questions to this topic, do not hesitate to drop a comment. I will try to be quick at getting back to you! 🙂

Today I was invited to speak at the very first meetup of the Rhus Meetup Club (in Aarhus, Denmark) on the topics of R, RHadoop, HDInsight and Azure Machine Learning. And it was a lot of fun! 🙂

Back when I was doing my postgraduate studies I was writing a heck of lot of R code (as the only student in class — everybody else was using commercial tools like Stata or SAS). And, man, I’d wish there was something like HDInsight when I was a student! It could have helped me out a whole lot when I was writing my thesis and working with the tons of patent data my supervisor had given me (from PATSTAT, approx. 75 million patent-sized full-text entries). My poor HP Envy was struggling a lot back then.

I think the presentation was well received — people didn’t complain too much at least! 😉 However, next time I think I will cut down on the material to cover. It’s a really complicated soup you’re cooking when you’re mixing in concepts like R, Hadoop (DFS), MapReduce, YARN, Azure Blob Storage, Azure Machine Learning and more — in just 60 minutes! I will give the same presentation in the CopenhagenR useR group later this month and I think there are some things I can improve before then.

Last week I gave a presentation at Campus Days 2014 on how to build a recommendation engine using HDInsight 3.1 in Azure. In this blog post I will go through the steps to replicate the demo I gave in the presentation. I will start from scratch with a brand new HDInsight cluster (version 3.1) and describe each of the necessary steps.

If you are just interested in downloading my PowerPoint presentation you can do that here:

Let’s Get Started

The steps to complete this demo might seem complicated. But really they are not! So don’t lose faith while going through this blog post (even though it might be long). You will get there and hopefully you will have learned something in the process.

In order to follow this guide you need to complete a few prerequisites:

An Azure subscription with sufficient funds (about $10 should be enough)