Issue #51

Nov 13 2014

Editor Picks

The Hipster Effect: An IPython Interactive ExplorationThis week I started seeing references all over the internet to this paper: The Hipster Effect: When Anticonformists All Look The Same. It essentially describes a simple mathematical model which models conformity and non-conformity among a mutually interacting population, and finds some interesting results...

The Learning Behind Gmail Priority InboxThe Priority Inbox feature of Gmail ranks mail by the probability that the user will perform an action on that mail. Because “importance” is highly personal, we try to predict it by learning a per-user statistical model, updated as frequently as possible. This research note describes the challenges of online learning over millions of models, and the solutions adopted...

Data Science Articles & Videos

Music Information Retrieval using Scikit-learn
Music information retrieval (MIR) is an interdisciplinary field bridging the domains of statistics, signal processing, machine learning, musicology, biology, and more. In this talk, Steve Tjoa from Humtap surveys common research problems in MIR, including music fingerprinting, transcription, classification, and recommendation, and recently proposed solutions in the research literature...

Text and Image Analysis: From pixels to characters and back
While text and images differ in many ways and can exist independently, they are in fact complementary and non-competing communication mediums, and to get a holistic view of the world, we would need to analyze both. Understanding images is as important as understanding text, as together they provide a more accurate picture of reality...

Data Science with Hadoop - predicting airline delays - part 1Every year approximately 20% of airline flights are delayed or cancelled, resulting in significant costs to both travellers and airlines. As our example use-case, we will build a supervised learning model that predicts airline delay from historial flight data and weather information...

Python as part of a production machine learning stack [at Stripe]While the vast majority of transactions facilitated by Stripe are honest, we do need to protect our merchants from rogue individuals and groups seeing to "test" or "cash" stolen credit cards. To combat this sort of activity, Stripe uses Python (together with Scala and Ruby) as part of its production machine learning pipeline to detect and block fraud in real time. In this talk, I'll go through the scikit-based modeling process for a sample data set that is derived from production data to illustrate how we train and validate our models...

Memory NetworksWe describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly...

Markov Chains vs Simulation: Flipping a Million Little CoinsI saw an interesting question on Reddit the other day. The problem was about estimating the amount of decaying radioactive isotopes in a sample after a set amount of time. I don’t think anyone in the thread brought up Markov chains, but that’s what I immediately thought of...

Random feedback weights support learning in deep neural networks
We show that a network can learn to extract useful information from signals sent through random feedback connections. In essence, the network learns to learn. We demonstrate that this new mechanism performs as quickly and accurately as backpropagation on a variety of problems and describe the principles which underlie its function...

Jobs

As a Senior Data Scientist in the Incubator Organization, this position involves utilizing novel tools for "big data" science. The individual will work in the Hadoop environment to analyze large volumes of data, ultimately using statistical and data mining tools including clustering, classification, and regression models to understand and predict consumer needs. The Senior Data Scientist will present technical insights to management to inform product development...

Recurrent Neural Networks with Word EmbeddingsIn this tutorial, you will learn how to do Word Embeddings using Recurrent Neural Networks architectures with Context Windows - in order to perform Semantic Parsing / Slot-Filling (Spoken Language Understanding)...

Books

In this handbook, data expert Q. Ethan McCallum has gathered 19 colleagues from every corner of the data arena to reveal how they’ve recovered from nasty data problems...

"This book gives an unbiased presentation of machine learning with solid theoretical justifications. It discusses the principles behind the design of learning algorithms by introducing and using the most modern tools and concepts in learning theory. This helps answering many fundamental questions..."