Which machine learning algorithm should I use?

A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?” The answer to the question varies depending on many factors, including:

The size, quality, and nature of data.

The available computational time.

The urgency of the task.

What you want to do with the data.

Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms. We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors.

You need to be a member of Data Science Central to add comments!

Thanks or attempting this to Dr. Li. Obviously, it's a complex topic. I would be interested in her thoughts on how often the choice of algorithm matters. Which is more productive: improving features (variables), tuning Algorithm X, or trying Algorithm Y.

I also noticed what appeared to be several bugs in the text. For example,

When most dependent variables are numeric, logistic regression and SVM should be the first try for classification. These models are easy to implement, their parameters easy to tune, and the performances are also pretty good. So these models are appropriate for beginners."

I think this should say "when most independent variables..." With continuous dependent variables, you cannot even use logistic regression. (I was unable to put this comment on the source page.)