Intermediate Machine Learning with scikit-learn

‘Machine learning’ is simply what we call the algorithmic extraction of knowledge from data. The ability to perform complex analysis of data, moving beyond the basic tools of statistics, has been refined and developed increasingly over the last two decades. Over a similar period, Python has grown to be the premier language for data science, and scikit-learn has grown to be the main toolkit used within Python for general purpose machine learning.

This course moves beyond the topics covered in Beginning Machine Learning with scikit-learn. A recap is given of a few essential concepts for students starting here. We then first discuss unsupervised machine learning techniques, and then look at data preparation and “massaging” that is always needed for robust models. Finally, we address concerns best practices for robust and generalizable modeling techniques needed for real-world data science.

What you'll learn-and how you can apply it

Recap: Classification vs. Regression vs. Clustering

Unsupervised machine learning

Feature engineering and feature selection

Pipelines

Better train/test splits

This training course is for you because...

You are an aspiring or beginning data scientist.

You have a comfortable intermediate-level knowledge of Python and a very basic familiarity with statistics and linear algebra.

You are a working programmer or student who is motivated to expand your skills to include machine learning with Python.

You have some familiarity with the fundamentals of machine learning or have taken the Beginning Machine Learning with scikit-learn live training class.

About your instructor

David Mertz was most recently a Senior Trainer and Senior Software Developer for Anaconda, Inc., in which role he created and structured its training program. He was a Director of the Python Software Foundation (PSF) for six years and remains co-chair of its Trademarks Committee and of the PSF Scientific Python Working Group. David worked for nine years with D. E. Shaw Research, some folks who built the world's fastest, highly-specialized (down to the ASICs and network layer) supercomputer for performing molecular dynamics.

David wrote the widely read columns Charming Python and XML Matters for IBM developerWorks, short books for O'Reilly, and the Addison-Wesley book Text Processing in Python. He has spoken at multiple OSCons, PyCons, and AnacondaCon, and was invited to be a keynote speaker at PyCon-India, PyCon-UK, PyCon-ZA, PyCon Belarus, PyCon Cuba, and PyData SF.

David is pleased to find Python becoming the default high-level language for most scientific computing projects.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing