Support Vector Machine (SVM) classification is a machine learning technique that can be used to make a binary prediction — that is, one where the thing-to-predict can be just one of two possible values. For example, you might want to predict if a person is a Male (-1) or Female (+1) based on predictor variables such as age and annual income.

SVM classification was popular in the late 1990s but SVM isn’t used very much anymore, and now neural networks are used much more often.

SVMs require you to specify a kernel function (such as RBF), any parameters that the kernel function needs (gamma in the case of RBF), a value for C which controls how much noise the SVM will tolerate when training. SVMs tend to be very sensitive to the values of these hyperparameters so training an SVM can be a pain.

Researchers were getting all excited about SVMs in the 1990s because the math is very elegant. But realistically, much of that research work was a solution in search of a problem.

Coding up an SVM implementation is very challenging and it took me over a day, but I did learn some valuable coding tricks and learned several new things about machine learning.

In the demo below, I have data that has a circular geometry. So, I used an RBF kernel with gamma = 1.0. I set the C value to 10.0. The demo model correctly predicts all 21 of the data points.

Moral of the story: You really don’t want to code an SVM classifier from scratch unless you absolutely have to, or unless you really want to understand SVMs. And for binary classification in general, you’re probably better off using a neural network.

1.) based on my experience, NNs are easier to train than SVMs, mostly because you have to guess which kernel to use for SVMs (but this could be related to the types of data I work with)
2.) NNs are usually quicker to train for very large datasets
3.) NNs integrate with other software systems better than SVMs (again, could be just for the types of problems I work with)
4.) most of my colleagues understand NNs well, but few understand SVMs well, so working in a group is easier with NNs
5.) theoretically NNs are universal function approximators — not sure if this is true for SVMs