Issue #17

March 20 2014

Editor Picks

In this post i will give you an unbiased look and insight at the political views of current President, Barack Obama using Data Science. This is a result of a research done in 2013 in which I assembled nearly 1 gigabyte of Barack Obama speeches and remarks scraped from the Whitehouse’s website, I analysed every word and every sentence using a combination of distributed computing, artificial intelligence and natural language processing algorithms...

I’ve been finding it fascinating to watch on as data scientists discuss the death of data science. It all started with a rather sensationalized post over at Slashdot: “Data Science is Dead” by Miko Matsumura, Vice President at Hazelcast. It wasn’t even a question for discussion, but rather a declaration of a foregone conclusion. The subtitle for the piece was “Not only is Data Science not a science, it’s not even a good job prospect.” Well, those are fightin’ words to a data scientist like me!...

We wanted a way to show off the Yhat WebSocket API, so we threw together a small node.js app that does real-time named entity recognition using nltk. It's not perfect, but considering it took me about an hour to build, I think it's off to a good start!...

Data Science Articles & Videos

How Deep Learning Analytics Mimic the MindDue to the recent acquisition of DeepMind by Google for an estimated $500+ million, and the movement of some academic experts to high-profile tech giants, there has been a lot of buzz surrounding the potential impact deep learning will have in the field of analytics. At FICO, we’re excited about this emerging machine learning technology and want to share how we think it fits into the world of analytics...

Predicting Student Exam’s Scores by Analyzing Social Network DataIn this paper, we propose a novel method for the prediction of a person’s success in an academic course. By extracting log data from the course’s website and using network analysis, we were able to model and visualize the social interactions among the students in a course… we successfully used several regression and machine learning techniques to predict the success of student in a course...

Gridspace uses NLP to Make your Meetings more EfficientEveryone hates meetings. They take up a lot of time and in many cases it’s not entirely clear that at the end of them anything has actually been accomplished. Everyone has different notes and action items and usually there’s no centralized place for a record of what transpired. There’s gotta be a better way. All of that is what Gridspace hopes to change...

The Holy Grail of Trading has been Found
Think JPM's zero trading day losses in 2013 was impressive? Prepare to have your mind blown. The chart below shows the chart of daily net trading income by High Frequency Trading titan Virtu, taken from its just filed IPO prospectus. The punchline: in 4 years of trading Virtu has had one, one, day in which it lost money...

R vs Python – Round 3
My friend Randy Olson and I got into the habit to argue about the relative qualities of our favorite languages for data analysis and visualization. I am an enthusiastic R user while Randy is a fan of Python. One thing we agree on is that our discussions are meaningless unless we actually put R and Python to a series of tests to showcase their relative strengths and weaknesses...

The best of both Worlds: Hierarchical Linear Regression in PyMC3The power of Bayesian modelling really clicked for me when I was first introduced to hierarchical modelling. In this blog post we will provide an intuitive explanation of hierarchical/multi-level Bayesian modeling; show how this type of model can easily be built and estimated in PyMC3...

Let’s play with Theano
This article will briefly present Theano, a machine learning library and introduce it with a small regression problem. It will describe how to play with Theano to classify a single word between french or english sets using a classifier over 4 features only...

Jobs

We are looking for talented data scientists to join the Kaggle team. We have branched out behind our core data mining competititons, to build end-to-end solutions on an industry by industry basis. Our first industry is energy, where we're building solutions that can transform the world's largest industry...

Training & Resources

PyBrain - a simple neural networks library in PythonWe have already written a few articles about Pylearn2. Today we’ll look at PyBrain. It is another Python neural networks library, and this is where similiarites end. They’re like day and night: Pylearn2 - Byzantinely complicated, PyBrain - simple...

Deep Learning is a new area of Machine Learning research, which has been introduced with the objective of moving Machine Learning closer to one of its original goals: Artificial Intelligence. This website is intended to host a variety of resources and pointers to information about Deep Learning...

Neural Networks Class by Hugo Larochelle
These are the videos I use to teach my Neural Networks class at Université de Sherbrooke. The videos, along with the slides and research paper references, are also available...