Outline

Introduction

Several approaches are used to derive classification rules for the diagnosis of patients. Most of the approaches belong to one of the three classes: regression models, classification trees or neural networks. Within each class a variety of modifications is proposed in the literature, furthermore several approaches to aggregate classifiers have been suggested to improve classification rules. Issues as complexity, (in-)stability and interpretability are discussed controversially, a general assessment of the advantages and disadvantages of specific approaches is difficult.

Subject and methods

Using diagnostic studies we will compare classifiers to differentiate between two groups. We will compare error rates, the Brier score and others for classifiers developed from logistic regression models, classification trees, boosting trees and random forests. We will also discuss issues of interpretability and practical usefulness.

Results

Concerning error rates and other statistical criteria differences between the approaches are small. Trees can be improved by ensemble methods. Concerning interpretability and practical usefulness regression models with strong factors included have several advantages.

Discussion

Ensemble methods can be used to improve classifiers based on trees. However, considering interpretability, transportability and practical usefulness as important criteria, regression models with the strong factors included are still the method of choice.