Issue #26

May 22 2014

Editor Picks

We recently caught up with Emmett Shear, CEO of Twitch. We were keen to learn more about his background, how data and Data Science have influenced Twitch's growth to this point, and what role they have to play going forward...

A few weekends ago, I made the decision to casually brush up on my neural networks. Why? Well, for starters neural networks are super interesting. Additionally, I was keen to revisit the topic given all the activity around "deep learning" in the Twittersphere. Julia turned out to be the perfect language for digging into the guts of a machine learning algorithm...

Nobody has figured out how to spot the most influential spreaders of information in a real-world network. Now that looks set to change with important implications, not least for the superspreaders themselves...

Data Science Articles & Videos

The Next Big Thing You Missed:
Airbnb’s Human Brains Crunch Data Better Than ComputersIn 2011, Airbnb had a problem. The room-sharing site was growing fast, but so were customer complaints. People just couldn’t figure out how to use the service. The issue was so severe, Airbnb was getting an average of one customer service call for every room booked. To figure out how to fix this problem, the company asked Newman to look at the data...

The term Big Data is going to disappear in the next 2 years.
Statistics will be what remains.There is no question that big data have hit the business, government and scientific sectors. However, there is plenty of misleading hype around the terms `big data' and `data science'. This presentation gives a professional statistician's view on these terms, illustrates the connection between data science and statistics, and highlights some challenges and opportunities from a statistical perspective.....

The Mind-Blowing Possibilities of plot.lyWe were fortunate enough to have Matt Sunquist of plot.ly come to our campus recently to talk about something that is his passion: sharing data for the purpose of data literacy...

Can We do Better than R-squared?
The R2 calculated in Excel is often used as a measure of how well a model explains a response variable. There's a hidden trap, though. R2 will increase as you add terms to a model, even if those terms offer no real explanatory power. By using the R2 that Excel so helpfully provides, we can fool ourselves into believing that a model is better than it is. Below I'll demonstrate this and show an alternative that can be implemented easily in R...

Ranking algorithms and the NFL (Part 1 of a series)
I recently picked up Who’s #1?: The Science of Rating and Ranking, a really fun read on the many ways to take a list of items and order them by some score. Obviously, rankings are a huge topic of interest in sports, and my day job is working on recommender systems, so I saw this as the natural intersection of these things...

VC Firm names Algorithm to its Board of Directors
Deep Knowledge Ventures, a firm that focuses on age-related disease drugs and regenerative medicine projects, says the program, called VITAL, can make investment recommendations about life sciences firms by poring over large amounts of data...

Jobs

The Algorithms Engineering (AE) team owns the research, development and innovation for the algorithms driving the Netflix product including Personalization and Search. We are looking for an experienced machine learning leader to join our team and become the technical point of reference for a brilliant team of researchers and developers...

Training & Resources

A Primer on Deep LearningIn a presentation I gave at Boston Data Festival 2013 and at a recent PyData Boston meetup I provided some history of the method and a sense of what it is being used for presently. This post aims to cover the first half of that presentation, focusing on the question of why we have been hearing so much about deep learning lately...

I am currently working towards the Johns Hopkins Data Science Specialization at Coursera. I posted my initial, and very positive, impressions when I was about half-way through the first four-week block. My impressions are still very favorable at completion. Now that the course is complete, I can post my complete thoughts for the first three courses...

Books

"This is the best general-readership book on applied statistics that I've read. Short review: if you're interested in science, economics, or prediction: read it. It's full of interesting cases, builds intuition, and is a readable example of Bayesian thinking."