Description

This tutorial will offer a hands-on introduction to machine learning and
the process of applying these concepts in a Kaggle competition. We will
introduce attendees to machine learning concepts, examples and flows,
while building up their skills to solve an actual problem. At the end of
the tutorial attendees will be familiar with a real data science flow:
feature preparation, modeling, optimization and validation.

Packages used in the tutorial will include: IPython notebook,
scikit-learn, pandas and NLTK. We’ll use IPython notebook for
interactive exploration and visualization, in order to gain a basic
understanding of what’s in the data. From there, we’ll extract features
and train a model using scikit-learn. This will bring us to our first
submission. We’ll then learn how to structure the problem for offline
evaluation and use scikit-learn’s clean model API to train many models
simultaneously and perform feature selection and hyperparameter
optimization.

At the end of session, attendees will have time to work on their own to
improve their models and make multiple submissions to get to the top of
the leaderboard, just like in a real competition. Hopefully attendees
will not only leave the tutorial having learned the core data science
concepts and flow, but also having had a great time doing it.