Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.

5 Answers
5

It's hard to say without knowing a little more about your dataset, and how separable your dataset is based on your feature vector, but I would probably suggest using extreme random forest over standard random forests because of your relatively small sample set.

Extreme random forests are pretty similar to standard random forests with the one exception that instead of optimizing splits on trees, extreme random forest makes splits at random. Initially this would seem like a negative, but it generally means that you have significantly better generalization and speed, though the AUC on your training set is likely to be a little worse.

Logistic regression is also a pretty solid bet for these kinds of tasks, though with your relatively low dimensionality and small sample size I would be worried about overfitting. You might want to check out using K-Nearest Neighbors since it often performs very will with low dimensionalities, but it doesn't usually handle categorical variables very well.

If I had to pick one without knowing more about the problem I would certainly place my bets on extreme random forest, as it's very likely to give you good generalization on this kind of dataset, and it also handles a mix of numerical and categorical data better than most other methods.