Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

LDA (linear discriminant analysis), SVMs with a linear kernel, and perceptrons are linear classifiers. Is there any other relationship between them, e.g.:

Every decision boundary that can be found by LDA can be found by linear SVM

Every decision boundary that can be found by linear SVM can be found by LDA.

Every decision boundary that can be found by LDA can be found by a perceptron

Every decision boundary that can be found by linear SVM can be found by a perceptron.

Every decision boundary that can be found by a perceptron can be found by LDA

Every decision boundary that can be found by a perceptron can be found by an SVM with a linear kernel

Always on the same data, of course.

For example, I think the linear SVM can find more decision boundaries than a perceptron due to slack variables. While the perceptron finds just an arbitrary hyperplane which separates the data linearly (if such a hyperplane exists), the linear SVM will always find the same hyperplane due to the optimality criterium.

2 Answers
2

3, 4. Yes, but only if it separates classes linearly and you are extremely lucky. Otherwise no.

5, 6. No, because SVM and LDA find only one solution, but perceptron can find many.

Now let me explain.

Decision boundaries of classicl SVM and LDA are calculated offline based on the whole training sample at once. Thus, SVM and LDA will indeed always find exactly one hyperplane, regardless of the order of learning examples.

But these hyperplanes of LDA and SVM are not generally guaranteed to be different. As its name suggests, SVM solution is based only on the support observations, which usually constitute a small fraction of the learning sample. LDA solution, however, is sensitive to all the learning examples, because it is explicitly based on their classwise means. Thus, you can shift one non-support point a little, and SVM solution will not change, but LDA will. And it means that you cannot expect that SVM and LDA will find the same boundary.

Perceptron, in contrast, is updated online, so its decision boundary depends on the order of learning examples. Moreover, its solutions depends on the initialization of coefficients. Namely, if $w_n$ is vector of coefficients after seeing $n$ learning examples, then:
$$
w_n = w_0 + \lambda \sum_{i=1}^{n} e_i x_i
$$
where $e_i$ equals 1 if $i$'th example was false negative, -1 for false positives, and 0 otherwise, and $\lambda$ is the learning rate. By varying $w_0$ and order of feeding $x_i$ into perceptron, you can achieve many different hyperplanes.

For example, if (by luck!) you set $w_0$ equal to solution of SVM or LDA, and this solution linearly separates your classes, then perceptron will never change this solution, so it will be equal to solution of SVM or LDA.

However, if the solution of SVM or LDA does not fully separate the classes (for LDA, this is may be the case even if the classes are separable), then perceptron will change it with the next misclassified example, so its solution will diverge from SVM or LDA.

Every decision boundary that can be found by linear SVM can be found by a perceptron.

Concerning 4: In the case where your classes are linearly separable:

There are usually infinitely many decision boundaries that achieve perfect classification. The Perceptron is guaranteed to find one of them, but it is virtually impossible to predict which one. (That will depend on the initial weights and the learning rate).

The SVM (without slack variables) is also guaranteed to find a perfect decision boundary; but not an arbitrary one. It finds one of the boundaries with maximal distance to the closest observations.

You can think of the SVM (without slack variables) picking the 'best' decision boundary among those that the Perceptron could find.