Support Vector Machines

SVMs attempt to find a hyperplane that separates the data points (meaning that all in a given class are on the same side of the plane), corresponding to a decision rule

In SVM literature, is often referred to as the weight vector; is called the bias (a term adopted from neural networks). This idea is not new; it dates back at least to R.A. Fisher and the theory of linear discriminants [6]. The novelty of SVMs lies in how this plane is determined: SVMs choose the separating hyperplane that is furthest away from the data points , that is, that has maximal margin (Figure 1). The underlying idea is that a hyperplane far from any observed data points should minimize the risk of making wrong decisions when classifying new data. To be precise, in SVMs we maximize the distance to the closest data points. We solve

where is the distance between data point and the plane , subject to the constraint that this plane still separates the classes. The plane that solves (1) is called the optimal separating hyperplane and is unique [5]. MathSVM provides algorithms for determining this plane from data.

Figure 1. Two-class data (black and grey dots), their optimal separating hyperplane (continuous line), and support vectors (circled in blue). This is an example output of the SVMPlot function in MathSVM. The width of the "corridor" defined by the two dotted lines connecting the support vectors is the margin of the optimal separating hyperplane.