Issue #55

Dec 11 2014

Special Message: We would like to make a shameless plug for a new book, authored by our very own Sebastian Gutierrez - Data Scientists At Work. More details in the "Book" section, though rest assured it is the perfect holiday read ;)

Editor Picks

The Current State of Machine IntelligenceI spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find — my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then...

Simulating Decisions to Improve ThemOne of the jobs of the Data Science team is to help zulily make better decisions through data. One way that manifests itself is via experimentation. Like most ecommerce sites, zulily continuously runs experiments to improve the customer experience...

Data Science Articles & Videos

Cyrus Vance Jr.’s ‘Moneyball’ Approach to Crime
A glance at New York City crime statistics might lead you to conclude that Cyrus Vance Jr., the district attorney of New York County, no longer works in what William Travers Jerome, who held the job more than a century ago, once called “the mouth of hell.” ...

Experiments at AirbnbWhile the basic principles behind controlled experiments are relatively straightforward, using experiments in a complex online ecosystem like Airbnb during fast-paced product development can lead to a number of common pitfalls. Some, like stopping an experiment too soon, are relevant to most experiments. Others, like the issue of introducing bias on a marketplace level, start becoming relevant for a more specialized application like Airbnb. We hope that by sharing the pitfalls we’ve experienced and learned to avoid, we can help you to design and conduct better, more reliable experiments for your own application...

Neural Networks Demystified [Part 4: Backpropagation]
Backpropagation as simple as possible, but no simpler. Perhaps the most misunderstood part of neural networks, Backpropagation of errors is the key step that allows ANNs to learn. In this video, I give the derivation and thought processes behind backpropagation using high school level calculus. ...

Using Data for a More Transparent Government
Our team at the Data Science for Social Good Fellowship, in collaboration with the Harris School of Public Policy, developed an automated system that uses machine-learning methods to identify earmarks in congressional documents. Using this approach, we construct the first publicly available database of earmarks that covers every year back to 1995....

The Doom that Came to Puppet
Posts generated by a Markov chain trained on the Puppet documentation and the assorted works of H. P. Lovecraft...

Jobs

As a Senior Data Scientist, you will be joining the team responsible for pushing technology boundaries in areas such as language translation, image recognition, natural language processing, and search ranking. Your work will directly empower the Shutterstock customer experience seen by millions of customers daily, and will enable new and unique customer features that drive Shutterstock's best in-class image and video search engine...

Training & Resources

Practical Data Science in PythonThis notebook accompanies my talk on "Data Science with Python". The goal of this talk is to demonstrate some high level, introductory concepts behind (text) machine learning. The concepts are accompanied by concrete code examples in this notebook, which you can run yourself (after installing IPython, see below), on your own computer...

A Tutorial on Principal Component AnalysisPrincipal component analysis (PCA) is a mainstay of modern data analysis - a black box that is widely used but (sometimes) poorly understood. The goal of this paper is to dispel the magic behind this black box...

Books

JUST RELEASED: A collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession...

"In this book, you will see how some of the world's top data scientists work across a dizzyingly wide variety of industries and applications – each leveraging their own blend of domain expertise, statistics, and computer science to create tremendous value and impact..."
- Peter Norvig, Director of Research, Google