How to Create a Supervised Learning Model with Logistic Regression

After you build your first classification predictive model for analysis of the data, creating more models like it is a really straightforward task in scikit. The only real difference from one model to the next is that you may have to tune the parameters from algorithm to algorithm.

How to load your data

This code listing will load the iris dataset into your session:

>>> from sklearn.datasets import load_iris
>>> iris = load_iris()

How to create an instance of the classifier

The following two lines of code create an instance of the classifier. The first line imports the logistic regression library. The second line creates an instance of the logistic regression algorithm.

Notice the parameter (regularization parameter) in the constructor. The regularization parameter is used to prevent overfitting. The parameter isn’t strictly necessary (the constructor will work fine without it because it will default to C=1). Creating a logistic regression classifier using C=150 creates a better plot of the decision surface. You can see both plots below.

How to run the training data

You’ll need to split the dataset into training and test sets before you can create an instance of the logistic regression classifier. The following code will accomplish that task:

Line 1 imports the library that allows you to split the dataset into two parts.

Line 2 calls the function from the library that splits the dataset into two parts and assigns the now-divided datasets to two pairs of variables.

Line 3 takes the instance of the logistic regression classifier you just created and calls the fit method to train the model with the training dataset.

How to visualize the classifier

Looking at the decision surface area on the plot, it looks like some tuning has to be done. If you look near the middle of the plot, you can see that many of the data points belonging to the middle area (Versicolor) are lying in the area to the right side (Virginica).

This image shows the decision surface with a C value of 150. It visually looks better, so choosing to use this setting for your logistic regression model seems appropriate.

How to run the test data

In the following code, the first line feeds the test dataset to the model and the third line displays the output: