Issue #35

July 24 2014

Editor Picks

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem...

Under the covers, Airbnb has quietly begun an ambitious effort to painstakingly mine the treasure trove of data contained in the site’s customer reviews and host descriptions to create a smarter way of traveling. It turns outs Airbnb is more than a travel website — it’s a stealth big data company...

Data Science Articles & Videos

Leading from the Back: Making Data Science Work at a UX-driven Business
MailChimp's success as a start-up wasn't built on data. It was built on a user experience that placed an intuitive and friendly interface on email marketing and removed much of the busy work. So how does a company whose business is not data, use its massive data set? John Foreman, author of the Excel-based data science book Data Smart and Chief Scientist for MailChimp, discusses what it means to "lead from the back" in data science, even if that sometimes means breaking out a spreadsheet in favor of Hadoop...

This post is inspired by the “metacademy” suggestions for “leveling up your machine learning.” They make some halfway decent suggestions for beginners. The problem is, these suggestions won’t give you a view of machine learning as a field; they’ll only teach you about the subjects of interest to authors of machine learning books, which is different...

Creating the "Dropbox of your Genome": Reid Robison InterviewWe recently caught up with Reid Robison, MD, MBA and CEO at Tute Genomics. We were keen to learn more about his background, his perspectives on the evolution of genomics, what he's working on now at Tute - and how machine learning is helping...

From Boom to Bust: Building a Predictive Quarterback ModelThis past off-season I took it upon myself to develop a metric for evaluating quarterback prospects for the NFL draft. My goal was to create a metric that could ultimately help predict which draft-eligible quarterbacks would be most likely to succeed in the NFL by identifying which traits quarterback prospects had in common with successful NFL quarterbacks when they were coming out of college...

Data Mining at NASA to Teaching Data Science at GMU: Kirk Borne InterviewWe recently caught up with Kirk Borne, trans-disciplinary Data Scientist and Professor of Astrophysics and Computational Science at George Mason University. We were keen to learn more about his background, his ground-breaking work in data mining and how it was applied at NASA, as well as his perspectives on teaching data science and how he is contributing to the education of future generations...

Doing Data Science in a Startup: The Hard Truth
I hate to break it to you, but a high-tech Internet startup is not a natural environment to do research. Most startups come into existence around a very applicable and practical idea (hopefully), which either requires no scientific research or the core research was already done by the founders before the startup came to be. However, there are a number of advantages that can make startups a much more attractive working experience than classic academic-style research...

Aspiring Data Scientist? Here Are Some At Work Project Ideas
Do you find yourself wanting to move into Data Science but keep hearing "get some data, analyze it, and you'll be fine..."? Have you developed many of the base skills for data science, such as programming, data analysis, and/or visualization but are unsure of how to apply them? Are you looking to differentiate yourself from the ever-growing pile of aspiring "data scientist" who have taken the usual Coursera classes and done Kaggle competitions? You are not alone...

Jobs

MailChimp's Data Science Team is seeking a software developer to help us build internal tools and processes. We don’t care about pedigree or what languages or stacks you’ve worked in, we’re just looking for performance-minded developers that listen hard and change fast. In fact, if you’d rather send us code than polish up your resumé, that works for us. You’ll work with our data scientists and our product developers to turn research into internal services that can move enormous piles of data for statistical analysis...

Training & Resources

Fuzzy Matching with Yhat
Ever had to manually comb through a database looking for duplicates? Anyone that's ever had a data entry job probably knows what I'm talking about. It's not fun! In this post I'm going to show you how you can write a simple, yet effective algorithm for finding duplicates in your data...

Books

"Stigler is unrivaled as a statistician who researches the history of statistics. This covers the famous mathematicians and statisticians who developed the foundation on which probability and statistics blossomed in the 20th Century. He is thorough and accurate and his writing is always clear and interesting. ..."