In previous series of posts we discussed simple and multivariate linear regression that can be used to predict target features with continuous values. Besides that, there are other prediction problems with categorical target features and we want to train a model so that we can use it to predict the class of unknown data. Logistic regression is one of these models.

Classification Problem

Imagine we have a tumor dataset which contains the Size of tumor and whether the tumor is Malignant or not. Assume that the class of a tumor can be identified by its size so that it is malignant if its size is bigger than a certain value. In this example, the data is a positive case if the tumor is malignant. We would like to find a model that can classify the tumors according to their size, as shown in below figure:

Since we are trying to classify the tumor by its size, it will be ideal if we can have the model like this:

Where x is input feature vector, in this case is the size of tumor. Then we can use the model to separate different classes of tumor:

Logistic Function

One of the problems of using this model is to find out the weights w since the decision boundary, as shown in previous figure, is discontinuous and we cannot use gradient descent to improve the model. We want a function that is continuous so that we can use the same optimization method as in linear regression, and behave similar to the previous model to separate two classes. The logistic function, or sigmoid function is the function that serve this purpose. The sigmoid function g(x) is given as:

If we plot the sigmoid function of x we can see that it behaves very similarly to the previous model. The use of logistic function as the regression model gives the name logistic regression.

Remember that our goal is to find the weight vector w so that given a set of descriptive features x for an instance, we can classify it whether it belongs to the interested class or not. Now we should replace x in the sigmoid function with w·x and the model is representing the probability y=1 given a particular input w·x.

For example, given a sample instance with x0=1 and x1 equals the tumor size, Mw(x) is 0.7 means the probability of the tumor being malignant is 70%.

Equipped with the above model, we can try to predict the class of a given instance. To get there, we need to first train the model. Moreover, in many real life cases, the classes cannot be simply separated by a straight line, as the tumor example above. We shall discuss these two topics in the coming posts.