Issue #141

Aug 4 2016

Editor Picks

Make Algorithms AccountableAlgorithms are ubiquitous in our lives. They map out the best route to our destination and help us find new music based on what we listen to now. But they are also being employed to inform fundamental decisions about our lives...

Build Algorithms Like You Give a Damn
For the second year in a row, WrangleConf did not disappoint. The conversation picked up right where last year’s left off: on the ethics of our craft. Last year the focus was on the humans building algorithms and the humans whom algorithms affect. This year, the discussion expanded in scope to consider the growing number people who interact with data science teams...

Does sentiment analysis work? A tidy analysis of Yelp reviewsSentiment analysis is often used by companies to quantify general social media opinion (for example, using tweets about several brands to compare customer satisfaction). One of the simplest and most common sentiment analysis methods is to classify words as “positive” or “negative”, then to average the values of each word to categorize the entire document. But does this method actually work?...

An experiment in trying to predict Google rankingsIn late 2015, JR Oakes and his colleagues undertook an experiment to attempt to predict Google ranking for a given webpage using machine learning. What follows are their findings, which they wanted to share with the SEO community...

Team USA by the NumbersIs being an Olympic champion determined by your genes? Longtime readers will remember the three-part series from 2014 exploring this question. That project was prompted by David Epstein’s claim, based on his book The Sports Gene, that given a roster of Olympic athletes and their weights and heights, he could predict their events with high accuracy. Using a variety of machine learning models I determined that they could achieve about 30 percent accuracy, which made me dubious of Epstein’s claims...

Dreaming of names with RBMsA classic problem in natural language processing is named entity recognition. Given a text, we have to identify the proper nouns. But what about the generative mirror image of this problem - i.e. named entity generation? What if we ask a model to dream up new names of people, places and things?...

Jobs

At Penguin Random House, The Data Science & Analytics group is an agile team comprised of data scientists, software engineers, front-end developers, and industry experts capable of tackling any data-oriented problem.

As a junior data scientist on this team, you will have an opportunity to work on a variety of high-profile projects while working closely with senior data scientists and key decision makers across the organization to help solve analytical problems of strategic value. Our areas of focus include price elasticity, consumer research, marketing attribution, and title segmentation, as well as ad-hoc analysis and data exploration. Your domain of expertise will be equal parts feature engineering and statistical analysis – or equivalently, machine learning...