Issue #87

July 23 2015

Editor Picks

Two examples of why machine learning is becoming the most powerful way to increase revenueFrom recommendations and personalization to ads and e-commerce, companies like Google, Facebook, Amazon, Netflix, and LinkedIn have been increasing revenue and engagement with machine learning for years. The success stories that follow show how we’re leveling the playing field by helping product teams and publishers leverage the same technology as these tech giants, without the need to build it in house...

Exploring the shapes of stories using Python and sentiment APIsUsing two hacks and a multinomial logistic regression model of n-grams with TF-IDF features, a pre-trained sentiment model can score the long-range sentiment of text of stories, books, and movies. The models do a reasonable job of summarizing the “shapes of stories” directly from text...

A Message from this week's Sponsor

Want to be a Data Scientist, but don't know where to start?
Learn essential Data Science skills in SlideRule's Intro to Data Science Workshop. In this online bootcamp, you'll learn R, data wrangling, analytics and visualization by working on real projects, with 1-on-1 mentorship from expert Data Scientists from LinkedIn, Glassdoor, Trulia and Stripe.

Spots are limited; registration ends in 48 hours!

Data Science Articles & Videos

Deriving the Reddit FormulaA few things about Reddit's hot formula have always bothered me. The formula has obviously been a success when it comes to setting the Internet on fire, but I have to wonder...

Is there a simple algorithm for intelligence?The question I explore here is whether there is a simple set of principles which can be used to explain intelligence? In particular, and more concretely, is there a simple algorithm for intelligence?...

Split Testing for GeniusesYou are sitting at a slot machine with two levers, labeled A and B. When you pull a lever, sometimes a dollar comes out of the slot and sometimes not. The casino tells you that each lever has a fixed chance of giving you a dollar (its success rate) but, of course, they don’t tell you what it is. Since you don’t have any way of distinguishing them to start, you pull lever A and a dollar comes out (Yipee!). What do you do next?...

Kaggle Competition Tips & SummariesOver the years, I’ve participated in a few Kaggle competitions and wrote a bit about my experiences. This page contains pointers to all my posts, and will be updated if/when I participate in more competitions....

Data Science at Agari: Forwarder ClassificationAmong the challenges that our engineering team faces is the ability to classify an email-sending entity as a forwarder. At Agari, we are primarily interested in the authentication of emails from originating senders...

Machine learning to predict San Francisco crime
In today’s post, we document our submission to the recent Kaggle competition aimed at predicting the category of San Francisco crimes, given only their time and location of occurrence. As a reminder, Kaggle is a site where one can compete with other data scientists on various data challenges. We took this competition as an opportunity to explore the Naive Bayes algorithm. With the few steps discussed below, we were able to quickly move from the middle of the pack to the top 33% on the competition leader board, all the while continuing with this simple model!...

Deepdream: Avoiding Kitsch
Yes yes, #deepdream. But as Memo Atkin and others point out, this is going to kitsch as rapidly as Walter Keane and lolcats unless we can find a way to stop the massive firehose of repetitive #puppyslug that has been opened by a few websites letting us upload selfies...

Jobs

The successful candidate will serve as a Data Scientist reporting to the Mayor’s Office of Criminal Justice. Responsibilities will include: Gather and convert data into insights to guide policy development and evaluation; work with City agencies to integrate data sets and develop data-informed strategies; utilize programming languages such as SAS, SQL, R, SPSS, Python; develop new approaches to collecting data not presently incorporated into City systems; work closely with operations, policy, and analytic teams to establish and validate models and approaches; and perform special projects and initiatives as assigned...

Training & Resources

Pyxley: Python Powered DashboardsWe have written a Python package, called Pyxley, to not only help simplify the development of web-applications, but to provide a way to easily incorporate custom Javascript for maximum flexibility...