Issue #28

June 5 2014

Editor Picks

I am a Machine Learning (ML) and Natural Language Processing enthusiast. For my university dissertation I created a realtime sentiment analysis classifier for Twitter. My talk is about the experience and the lessons learned... showing how easy it can be to build a ML SaaS by using some of the amazing libraries such as NLTK, ZMQ and MrJob that have helped me...

Scikit-learn is an awesome tool allowing developers with little or no machine learning knowledge to predict the future! But once you’ve trained a scikit-learn algorithm, what now? In this talk, I describe how to deploy a predictive model in a production environment using scikit-learn and RabbitMQ. You’ll see a realtime content classification system to demonstrate this design...

Data Science Articles & Videos

A Growing Number of Applications are being built with SparkThe number of companies that are using (or plan to use) Spark in production1 has exploded over the last year. The surge in popularity of the Apache Spark ecosystem stems from the maturation of its individual open source components and the growing community of users...

Convolutional Network Demo from 1993, featuring Yann LeCunThis is a demo of "LeNet 1", the first convolutional network that could recognize handwritten digits with good speed and accuracy. It was developed between 1988 and 1993 in the Adaptive System Research Department, headed by Larry Jackel, at Bell Labs in Holmdel, NJ...

On the Importance of Text Analysis for Stock Price PredictionWe investigate the importance of text analysis for stock price prediction. In particular, we introduce a system that forecasts companies’ stock price changes (UP, DOWN, STAY) in response to ﬁnancial events reported in 8-K documents. Our results indicate that using text boosts prediction accuracy over 10% (relative) over a strong baseline that incorporates many ﬁnancially-rooted features...

Bandits for Recommendation Systems
In this blog post, we will discuss the bandit problem and how it relates to online recommender systems. Then, we'll cover some classic algorithms and see how well they do in simulation...

Statistical Language Wars: The Infograph
A feature all programming communities have in common is the numerous debates about why their programming language of choice is better, more advanced, faster, holier etc. In today’s data science community, it seems like these discussions are omnipresent with advocates of SAS, SPSS, R, Python, Julia, etc. battling and challenging each other on every online medium...

Everything You Wanted to Know about the Kernel Trick
The goal of this writeup is to provide a high-level introduction to the "Kernel Trick" commonly used in classification algorithms such as Support Vector Machines (SVM) and Logistic Regression. My target audience are those who have had some basic experience with machine learning, yet are looking for an alternative introduction to kernel methods...

Jobs

Are you interested in applying machine learning or data mining on problems that truly improve people’s life? We’re looking for a mathematician/data scientist eager to tackle unique challenges in the realm of predicting weather’s impact on business. You will work on a skilled team of passionate data scientists and meteorologists. Examples of projects you may encounter would be anything from predicting the electricity output of a solar park in Arizona, to predicting how much ice cream is going to be sold next week in Chicago...

Books

"Outlier Detection for Temporal Data covers topics in temporal outlier detection, which have applications in numerous fields. It starts with the basic topics then moves on to state of the art techniques in the field."