Data science, statistics or machine learning in broken English

I think a lot of people love logistic regression because it's pretty light and fast. But we know it's just a linear classifying function -- I mean it's only for linearly separable patterns, not linearly non-separable ones.

It's primary idea is simple: fitting binomial dependent variable with logit function. But its advantage is great even though mainly it works only for linearly separable patterns.

A brief trial on a short version of MNIST datasets

First I tried vglm() function of {VGAM} package on short MNIST dataset.

Algorithm summary

I think I should have cite this well-known but amazing textbook, ESL, in the previous post... for understanding an algorithm of logistic regression, just see 4.4.1 section of the textbook.

In short, it's merely a fitting procedure by maximum likelihood with binomial (or multinomial) distribution. If you want to implement from scratch, you have to understand how iteratively reweighted least squares (IRLS) algorithm works.

As a result of some calculation, finally we have two important steps in order to get estimates of parameters .

Through them, change at each iteration. With repeating these steps, we can solve a weighted least mean square problem below.

This algorithm is not so complicated that everybody can easily implement e.g. in Python, Java or any other language. Even you can implement in R, but I guess it would be heavy :P)

How it works on XOR patterns and linearly separable patterns

XOR patterns

In principle, logistic regression doesn't work for linearly non-separable patterns. Please download "xor_simple.txt", "xor_complex.txt" from my GitHub repository and run on R as below.

Yes, in both cases a plain well classifies the samples. This is just a specific case but it's important to remember this characteristic of logistic regression.

Linearly separable patterns

We know logistic regression obviously works well for linearly separable patterns, but it's important to see how good really it works. Please download "linear_bi.txt" and "linear_multi.txt" from my GitHub repository and import them.

All perfect! :) Please remember how decision tree works on the same linearly separable patterns.

Conclusions

Lessons in this post are:

Logistic regression is one of the best classifier for linearly separable patterns

But in some limited cases it also works for linearly non-separable patterns with some interaction terms

Actually I don't know how interaction terms strictly work in logistic regression (in particular its theoretical property), but I think it would be helpful for improving performance of logistic regression as a classifier.