Which classifier has the best performance?

This question was posted on one of our LinkedIn groups. The author wrote:

In practice, given a wide range of classifiers, we often have to choose the one based on performance comparison through validation. Research literature shows that there is no classifier that performs universally best in all contexts for all problems. The following paper applied 8 most popular classifiers (e.g., SVM, Neural Net, Ensemble, KNN, Decision Tree, Logistic Regression, Discriminant Analysis, Naive Bayes, etc.) in Machine Learning arena to solve a problem currently confronting finance institutions such as banks, insurers, asset managers in their derivative valuation and risk management. The paper shows that when properly parameterized (which the paper discusses in details), the performances are either consistent with or contrary to some classic studies in the area. The paper is available at SSRN.

I replied, saying that a better question is which classifier performs worst, as the answer is more simple. Answers that come to my mind are classifiers based on discriminate analysis or Naive Bayes. For instance, discriminate analysis only allows you to detect clusters that are linearly separated. A blend of various classifiers usually works better than a single one, and sometimes just transforming your data (using a log scale) provides substantial improvement.

However, this being an important question, I'd like to ask our members how they chose a classifier. Even defining performance is not straightforward here.

Logistic regression enables the researcher to explicitly parameterize and estimate a theoretical model. The other techniques to varying degrees put more emphasis on purely empirical estimation from the data.

Whether this is advantageous or disadvantageous will of course depend on the situation but as we move to an increasingly Big Data world I believe that theory-based research will have an inherent advantage: with thousands of potential explanatory variables available, the potential for noise to overwhelm the signal increases, unless we have theoretical models to suggest which variables are more likely to convey signal.

It's similar to how computers can beat human chess (and now Go) players -- but the strongest players in the world are expert human players combined with computers. Pure machine learning gets stronger every day, but still benefits from human input.

I expect KNN to fail because of the curse of dimensionality, so it should be put aside at the outset.

Exclude LDA next, Elements of Statistical Learning (Friedman, Hastie, Tibshirani) has a persuasive case that several of these are equivalent. LDA is not better, by any standard, than logistic regression, so rule out LDA. SVM can be seen as a special case of logistic regression with regularized parameters. Hastie's website has slides about it (https://web.stanford.edu/~hastie/Papers/svmtalk.pdf). These fall into a family of classifiers that do well.

To me, with a lot of multi-correlated observables, the random forest seems most persuasive. This article claims that a variant of SVM does as well as the random forest, and it is faster. Check out this article by J. Wainer, "Comparison of 14 different families of classification algorithms on 115 binary datasets" https://arxiv.org/pdf/1606.00930.

The classification is all 2 categories and I'm not entirely persuaded that a multi-category classification will be equivalent.

As indicated in our paper, the original objective of our research was to search for a real-world solution for a real-world need to find the proxy for a type of financial market feature variable (in this case, Credit Default Swap, CDS curves) for those illiquid corporates, i.e., those that don't have liquid quotes. It turned out classifier performance comparison is a natural extension of the research.

We followed through existing literature (two classic ones are mentioned in the paper); clearly, to find the best (in terms of optimal paramterization choices) of the best (in terms of classifier families across 8 most popular ones). Naturally, we had to compare 156 classifiers (we did a lot more than that; but we had to cut the paper size to current version). The paper is here: https://ssrn.com/abstract=2967184