Issue #7

January 9 2014

Editor Picks

Google can identify and transcribe all the views it has of street numbers in France in less than an hour, thanks to a neural network that’s just as good as human operators. Now its engineers reveal how they developed it...

We recently caught up with Dave Sullivan, Founder and CEO of Blackcloud BSG - the company behind Ersatz - and host of the San Francisco Neural Network Aficionados group. We were keen to learn more about his background (from history grad to Machine Learning expert), recent developments in Neural Networks/Deep Learning and how Machine Learning as a Service (MLaaS) is evolving...

ConvNetJS is a Javascript library for training Deep Learning models (mainly Neural Networks) entirely in your browser. Open a tab and you're training. No software requirements, no compilers, no installations, no GPUs, no sweat...

Data Science Articles & Videos

5 Things I’ve Learned About Data ScienceA few months ago I started a new job for the first time in 10 years, leaving my comfortable home at a government FFRDC for an exciting opportunity with the new data science and analytics team at FitnessKeeper. Here are a few of the things I’ve learned...

How I Made $500k With Machine Learning And High Frequency TradingThis post will detail what I did to make approx. 500k from high frequency trading from 2009 to 2010. Since I was trading completely independently and am no longer running my program I’m happy to tell all. My trading was mostly in Russel 2000 and DAX futures contracts...

The Mathematics Of GamificationAt Foursquare, we maintain a database of 60 million venues. Like many existing crowd-sourced datasets (Quora, Stack Overflow, Amazon Reviews), we assign users points or votes based on their tenure, reputation, and the actions they take. Superusers like points and gamification. But data scientists like probabilities and guarantees. We’re interested in making statements like, “we are 99% confident that each entry is correct.” How do we allocate points to users in a way that rewards them for behavior but allows us to make guarantees about the accuracy of our database?...

How Netflix Re-Engineered Hollywood
To understand how people look for movies, the video service created 76,897 micro-genres. We took the genre descriptions, broke them down to their key words, … and built our own new-genre generator. Through a combination of elbow grease and spam-level repetition, we discovered that Netflix possesses not several hundred genres, or even several thousand, but 76,897 unique ways to describe types of movies...

How To Build A Mind
Joscha Bach presents a foray into the present, future and ideas of Artificial Intelligence. Are we going to build (beyond) human-level artificial intelligence one day? Very likely. When? Nobody knows, because the specs are not fully done yet. But let me give you some of those we already know, just to get you started...

FrankenImage: An Image-Based, Non-Photorealistic Renderer
The goal of FrankenImage is to reconstruct input (target) images with pieces of images from a large image database. FrankenImage is deliberately in contrast with traditional photomosaics... [it] aims for component database images to be as large as possible in the final composition, taking advantage of structure in each database image, instead of just its average color...

Six Novel Machine Learning Applications
Cutting-edge startups (as well as established tech companies and Universities) are increasingly finding new, novel, and exciting ways to apply powerful machine learning tools such as neural networks to existing problems in many different industries. Below is a list of 10 of the most interesting applications...

How Google Is Using People Analytics To Completely Reinvent HR
Most companies on the top 20 market cap list could be accurately described as “old school,” because most can attribute their success to being nearly half a century old, having a long established product brand, or through great acquisitions. Google’s market success can instead be attributed to what can only be labeled as extraordinary people management practices that result from its use of “people analytics.”...

Prismatic's Schema For Server & Client-Side Data Shape ValidationAria Haghighi introduces Schema, a Clojure and ClojureScript library for declaring and validating the shape of data. One of the difficulties with bringing Clojure into a team is the overhead of understanding the kind of data (e.g., list of strings versus, nested map from long to string to double) that a function expects and returns. While a full-blown type system is one solution to this problem, we present a lighter weight solution: schemas...

Jobs

Netflix takes its data seriously and leverages it as part of our core culture to make data-driven decisions to steer product development, and we've only scratched the surface in the types of deeper analytics we'd like to do! Want to help? We're looking for an additional data engineer/scientist to work directly with the product development and streaming platform teams to build better data frameworks and dig into analytics regarding the quality and performance of the streaming experience...

Training & Resources

There is a lot of noise around the "R versus Contender X" for Data Science. I think the two main competitors right now that I hear about are Python and Julia. I'm not going to weigh into the debates because I go by the motto: "Why not just use something that works?". So I thought I'd point out a few cool things that R can do...

There are many methods and much research put into outlier detection. Start by making some assumptions and design experiments where you can clearly observe the effects of those assumptions against some performance or accuracy measure. I recommend working through a stepped process from extreme value analysis, proximity methods and projection methods...

In this post, I’m going to show you how to use Machine Learning (as it couldn’t be otherwise) to quickly check whether there’s a covariate shift between training data and production data. You read it right: Machine Learning to learn whether machine-learned models will perform well or not...

P.S. Did you enjoy the newsletter? Do you have friends/colleagues who might like it too? If so, please forward it along - we would love to have them onboard :)

Sign up to receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe. No spam — we keep your email safe and do not share it.