Enter your mobile number or email address below and we'll send you a link to download the free Kindle App. Then you can start reading Kindle books on your smartphone, tablet, or computer - no Kindle device required.

or

Fulfillment by Amazon (FBA) is a service we offer sellers that lets them store their products in Amazon's fulfillment centers, and we directly pack, ship, and provide customer service for these products. Something we hope you'll especially enjoy: FBA items qualify for FREE Shipping and Amazon Prime.

About This Book

Learn effective strategies and best practices to improve and optimize machine learning systems and algorithms

Ask – and answer – tough questions of your data with robust statistical models, built for a range of datasets

Who This Book Is For

If you want to find out how to use Python to start answering critical questions of your data, pick up Python Machine Learning – whether you want to get started from scratch or want to extend your data science knowledge, this is an essential and unmissable resource.

What You Will Learn

Explore how to use different machine learning models to ask different questions of your data

Learn how to build neural networks using Pylearn 2 and Theano

Find out how to write clean and elegant Python code that will optimize the strength of your algorithms

Discover how to embed your machine learning model in a web application for increased accessibility

Predict continuous target outcomes using regression analysis

Uncover hidden patterns and structures in data with clustering

Organize data using effective pre-processing techniques

Get to grips with sentiment analysis to delve deeper into textual and social media data

In Detail

Machine learning and predictive analytics are transforming the way businesses and other organizations operate. Being able to understand trends and patterns in complex data is critical to success, becoming one of the key strategies for unlocking growth in a challenging contemporary marketplace. Python can help you deliver key insights into your data – its unique capabilities as a language let you build sophisticated algorithms and statistical models that can reveal new perspectives and answer key questions that are vital for success.

Python Machine Learning gives you access to the world of predictive analytics and demonstrates why Python is one of the world's leading data science languages. If you want to ask better questions of data, or need to improve and extend the capabilities of your machine learning systems, this practical data science book is invaluable. Covering a wide range of powerful Python libraries, including scikit-learn, Theano, and Pylearn2, and featuring guidance and tips on everything from sentiment analysis to neural networks, you'll soon be able to answer some of the most important questions facing you and your organization.

Style and approach

Python Machine Learning connects the fundamental theoretical principles behind machine learning to their practical application in a way that focuses you on asking and answering the right questions. It walks you through the key elements of Python and its powerful machine learning libraries, while demonstrating how to get to grips with a range of statistical models.

"Enlightenment Now: The Case for Reason, Science, Humanism, and Progress"
Is the world really falling apart? Is the ideal of progress obsolete? Cognitive scientist and public intellectual Steven Pinker urges us to step back from the gory headlines and prophecies of doom, and instead, follow the data: In seventy-five jaw-dropping graphs, Pinker shows that life, health, prosperity, safety, peace, knowledge, and happiness are on the rise. Learn more

Special offers and product promotions

Editorial Reviews

About the Author

Sebastian Raschka

Sebastian Raschka is a PhD student at Michigan State University, who develops new computational methods in the field of computational biology. He has been ranked as the number one most influential data scientist on GitHub by Analytics Vidhya. He has a yearlong experience in Python programming and he has conducted several seminars on the practical applications of data science and machine learning. Talking and writing about data science, machine learning, and Python really motivated Sebastian to write this book in order to help people develop data-driven solutions without necessarily needing to have a machine learning background. He has also actively contributed to open source projects and methods that he implemented, which are now successfully used in machine learning competitions, such as Kaggle. In his free time, he works on models for sports predictions, and if he is not in front of the computer, he enjoys playing sports.

Top customer reviews

There was a problem filtering reviews right now. Please try again later.

First some general, higher-level thoughts and comments before I dive into specifics:

MY BACKGROUND:Data Scientist; B.S. in Economics and M.S. in Business Analytics; experienced (though by no means expert) user of Scikit-learn

OVERALL THOUGHTS:I've purchased and read (virtually) every Machine Learning book that aims to teach the reader the basics of ML using the Scikit-learn library as the main focus. I've found them to be...less than satisfactory. The examples in other books often use ML techniques in contexts for which they are not intended to be used and/or contexts they are not used in out in the real world (among other issues I have found within them).In stark contrast, Python Machine Learning by Sebastian Raschka is stunningly-impressive, not only for the breadth and depth of coverage, but also in the manner the information is presented to the reader.

To date, I have not encountered a book on ML that incorporates multiple levels of learning in a manner such as this. It is the textual equivalent of a Neural Network with hundreds of hidden layers running on the latest NVIDIA GPU (if that comparison is lost on you, don’t worry; it’ll all make sense by the time you finish the book).

One of the underlying (though understated) themes in the book is the importance of using visual aids where appropriate to gauge the performance of the algorithms you’re using as well as to understand exactly what is going on behind the scenes, so-to-speak. If you’re a novice user of the Matplotlib graphics library for Python, this book will greatly improve your visualization skills by the time you’re done which I found to be an added bonus.

Another underlying theme is basic optimization using the NumPy library. This is reinforced throughout the book in the examples that you code by hand. Ditto for the Pandas library. To those of you brand-new to Python, you may not fully appreciate this aspect of the book until you gain some more experience and you’ve gone through the book a few times. For those of you who are more experienced users, the examples provide an amazing amount of insight into simple ways to make your code more efficient. Indeed, “Best Practices” abound in Python Machine Learning.

As a final general thought, Sebastian is an active contributor to Scikit-learn; something I do not believe to be the case with the authors of the other books that I’ve read. In order to effectively demonstrate and communicate the power of the Scikit-learn library, you really need to be familiar with it from a fundamental level. Sebastian has this knowledge in spades and that becomes readily apparent as you progress through the book. He makes no assumption about the knowledge base of the reader—he doesn’t have to—because the book incorporates learning styles appropriate for differing skill levels (see below)

FOR BEGINNING USERS:You may have some experience with Scikit-learn and Python, though not necessarily enough where you have developed some of the “best practices” I mentioned above; you’re still getting comfortable using the library and the Python environment. This book is definitely for you!The best way to learn this subject is by coding examples. You could not ask for a finer book on the subject; for those just starting their ML journey, you’re in good hands with Sebastian. You’ll get an excellent, hands-on education using some of the most important ML algorithms in use today in the most popular ML library used in Python. You’ll begin to develop good habits and you’ll see from a basic level, how to actually create algorithms on your own, outside of Scikit-learn! Then after you’ve had the experience of coding the algorithm by hand, you’ll move to Scikit-learn and get even more hands-on experience. You will learn about the tried-and-true algorithms that have been around for decades as well as concepts that are still in their infancy and are considered the current state-of-the-art. And as I said, you’ll learn about them by actually using them to build ML models. You’ll see how you take a concept for a project and turn it into reality using some really fantastic algorithms.

FOR INTERMEDIATE USERS:You are comfortable using Python and Scikit-learn and have participated (or you are considering participating) in one or more Kaggle competitions. You may have some good habits/best practices formed but you’re looking to take the next step; you may know how to take a project from the data gathering and cleaning stage to a final model, but you may not have actually done it or you aren’t sure how to properly evaluate the model you have created in the end stages; you want to gain a thorough understanding of which situations are appropriate for each of the algorithms and more importantly, which situations are NOT appropriate for each of the algorithms; you want to gain a firm knowledge of how the algorithms work and you’re curious about what the state-of-the-art concepts are. Good news!This book is DEFINITELY for you.

There comes a time in every Data Scientist’s life when you have read everything you can find on how to structure and complete projects and you feel confident that you’re ready. Then you start and you realize during the course of a project that you suddenly have a dozen more questions:

What should I do with all of these missing values?Should I use PCA and/or other dimensionality reduction techniques?How many folds should I use in my Cross Validation?Should I use Nested Cross Validation or will simple K-Fold Cross Validation suffice?Do I need to standardize my data in order to use run a Logistic Regression algorithm?How about with a Random Forest?What performance metric is most appropriate for my model?What is L1 and L2 regularization (again) and when should I use it?

If you have ever asked yourself any of these questions, rest assured this book will become your go-to reference for these questions as well as questions that you haven’t even thought of yet. Sebastian will fill in the gaps in your knowledge and you’ll gain the confidence to tackle the projects you have been looking forward working on all this time.

FOR ADVANCED USERS:Much of the information in this book may be familiar to you, however the mathematical concepts behind the algorithms may not be. You may be interested in reading the seminal research on each of the concepts presented in the book. Sebastian has you covered as well. He provides symbolic mathematical proofs for those so-inclined, as well as a multitude of citations for where you can find the research that supports and/or explores the concepts more thoroughly. The book is well-researched and cited and the concepts are given very thorough treatment.

TL;DR (SUMMARY):I realize the experience levels described above are subjective. They are present merely to serve as reference points for the readers and to underscore my belief that Python Machine Learning has something for virtually every skill level. I cannot recommend this book more highly!

BONUS - Topics/Algorithms Covered Throughout the Book (there are a TON!):

PerceptronAdalineStochastic Gradient Descent (SGD)Support Vector Machines (SVM)Logistic RegressionDifference between L1 and L2 regularization (with excellent graphics showing the difference)Out-of-core/online learning (truly Big Data)The Kernel TrickParameter OptimizationRandom Forest ClassifierParametric vs. Non-parametric (and which are which)K-Nearest Neighbors (KNN)Bias/Variance Trade off (great graphics showing the difference)Decision TreesStandardization (Data Preprocessing)Scaling (Data Preprocessing)Correct Mapping of data types for use in Scikit-learn algorithmsOver/Under fittingSequential Forward Selection (Feature Selection)Sequential Backward Selection (Feature Selection)Feature Importances using Random Forests (Feature Selection)Common pitfalls (“gotchas”) that can arise with use of Random ForestsPrincipal Component Analysis (PCA)Latent Discriminant Analysis (LDA – this topic is almost never covered in similar books)Kernel PCA + caveats for its useUse of Pipelines in Scikit-learn for streamlining the modeling process (this never gets coverage and is a big efficiency boost)Cross Validation (K Fold)Nested Cross ValidationCommon Metrics for Model Evaluation and how to graph each to gauge performanceEnsemble Methods (Majority Voting Classifier)Plotting Decision Boundaries (important for gaining insight into ensemble performance)Bootstrap Aggregating (Bagging)BoostingSentiment Analysis using bag-of-words modelSentiment Analysis using SGD Classifier and Out-of-Core learning to analyze large document datasets via streaming/mini-batching for Data that is too large to fit in memory at onceEmbedding Machine Learning algorithms into web applications using the web framework called Flask—this is a hot skill to have in the job marketRegression Analysis for Continuous Target VariablesAesthetic adjustments/extensions to Matplotlib graphs using the Seaborn libraryRANSAC RegressionDealing with non-linear relationships in the context of Regression (Data Transformations)Clustering (K-Means, Agglomerative, Divisive)Visualizing clustersHard vs. Soft clusteringThe “Elbow” method for clusteringDBSCAN clusteringCommon “gotchas” to be aware of when using clustering algorithmsArtificial Neural Networks (ANN)Multi-Layer Perceptron (MLP) Neural NetForward PropagationBackward PropagationUsing the Theano library to run Neural Networks on Graphical Processing Units (GPU)—this is an extremely hot topic and demonstrates the timelessness of many of these algorithms.

This is a fantastic book, even for a relative beginner to machine learning such as myself. The first thing that comes to mind after reading this book is that it was the perfect blend (for me at least) of theory and practice, as well as breadth and depth.

Let’s face it, we know that machine learning isn’t an easy subject. You need theory…but you also need practice in the form of some serious coding before you really start understanding it. And this is one area where Sebastian’s book shines: it contains a plethora of really good code examples that are illuminating and well explained, and which cover a very wide range of different machine learning algorithms. And, speaking of code, as another reviewer has pointed out, another huge plus is that, in many places, Sebastian shows you how to gauge the performance of your code and make it more efficient.

For me, the best measure of any book such as this is how many “ah ha!” moments I had while reading it. And I had more than a few while reading Sebastian’s book. One such “ah ha!” moment came while reading chapter 12 (and this also illustrates that nice blend of theory and practice I already mentioned above). In this particular chapter, he discusses training artificial neural networks for image recognition. At the heart of this approach is back propagation, which is pretty much THE bread and butter behind multilayered neural networks. He presents a detailed discussion of back propagation in two separate pieces: one that is intuitive and “top down”; the other a more mathematical, “bottoms up” approach that goes through the algorithm step by step, showing how the gradients are computed and the weights updated. His treatment of back propagation was one of the better explanations I’ve seen and really cleared things up for me.

One last thing I must mention: at the time of release, this was the first machine learning book for Python (to my knowledge) that has an entire chapter devoted to Theano, which he uses to parallelize neural network training. For those who don’t know, Theano is a particularly nice (not to mention very powerful) Python library for doing machine learning, most especially if you can utilize the power of GPU computing. In addition, that particular chapter (13) also introduces the brand new Python library named Keras, which is built on top of Theano and is a really nice library for the rapid building and prototyping of neural networks (in the spirit of Torch). Being a brand new library, his treatment of Keras was necessarily brief, but it was a great starting point.

In conclusion, I am very confident that if you do pick up this book, you won’t be at all disappointed. And be sure and grab the accompanying code for the book on his GitHub repository (just look for “python-machine-learning-book” on github.com/rasbt.) His code is top notch and I’ve yet to encounter any problems with it.

The book is very well organized, well written and has good on-line resources for downloading quality examples. It starts with simple, basic examples and does not make many assumptions of previous knowledge of this topic.

This book has a lot to offer both in the way of theory and code. It is an advanced book assuming you have knowledge of Python and some advanced mathematics and statistics. About half of the chapters were difficult for me, but I did not mind as it gave me some places to research more.

I am researching word analysis and there was a dedicated chapter for it (Chapter 8, Applying Machine Learning to Sentiment Analysis). I went through this chapter multiple times in depth and it was rewarding for the project I am working on.

There a lot of useful links and other resources scattered throughout the book and the author lets you know where to go if you need more information. I found a lot of these asides in the book to be useful.

The writing style was well polished and there were many graphs or diagrams for the more confusing concepts.

This book was excellent. I was an experienced python user and knew a lot of the machine learning concepts before hand but had never used them in the real world or with python. The book does a great job first explaining a method, walking you through the math so you understand what is occurring. Then it dives into 'lower level' code to show you the construction of the algorithms and finished off with what you would actually use in practice. This made it very easy to follow and understand. I would recommend this to anyone looking to develop an advanced understanding of machine learning!