Imbalanced Classes In SVM

20 Dec 2017

In support vector machines, $C$ is a hyperparameter determining the penalty for misclassifying an observation. One method for handling imbalanced classes in support vector machines is to weight $C$ by classes, so that

$$C_k = C * w_j$$

where $C$ is the penalty for misclassification, $w_j$ is a weight inversely proportional to class $j$’s frequency, and $C_j$ is the $C$ value for class $j$. The general idea is to increase the penalty for misclassifying minority classes to prevent them from being “overwhelmed” by the majority class.

In scikit-learn, when using SVC we can set the values for $C_j$ automatically by setting class_weight='balanced'
The balanced argument automatically weighs classes such that:

$$w_j = \frac{n}{kn_{j}}$$

where $w_j$ is the weight to class $j$, $n$ is the number of observations, $n_j$ is the number of observations in class $j$, and $k$ is the total number of classes.