Knowledge Dump: Machine Learning Scoring & Evaluation Metrics

I’ve decided to start a small series of knowledge dumps, where I post my notes from study sessions I have in my free time. Below you’ll find my thought process when learning new information, particularly around machine learning. I find myself asking the same questions over and over, so I write my answers down. Let me know if this has helped you as well!

NOTE: Often times I copy + paste directly from other websites, so I’m confident that if you find an entire paragraph on something, it likely came from another source.

The regression model on the left accounts for 38.0% of the variance while the one on the right accounts for 87.4%. The more variance that is accounted for by the regression model the closer the data points will fall to the fitted regression line. Theoretically, if a model could explain 100% of the variance, the fitted values would always equal the observed values and, therefore, all the data points would fall on the fitted regression line.

Metrics computed from a confusion matrix

A confusion matrix gives you a more complete picture of how your classifier is performing & allows you to compute various classification metrics, which help guide model selection. Useful for multi-class problems.

Sensitivity: When the actual value is positive, how often is the prediction correct?

How “sensitive” is the classifier to detecting positive instances?

AKA: “True positive rate” or “Recall”

Specificity: When the actual value is negative, how often is the prediction correct?

How “specific” (selective) is the classifier in predicating positive instances?

False Positive Rate: When the actual value is negative, how often is the prediction incorrect?

Precision: When a positive value is predicted, how often is the prediction correct?

How “precise” is the classifier when predicting positive instances?

# relevant found / # found

How do you choose which metrics to focus on?

Depends on business objective.

Spam Filter: Optimize for precision or specificity because false negatives are more acceptable than false positives.

Fraudulent transactions: Optimize for sensitivity, because false positives are more acceptable than false negatives.

Changing the threshold from the default value of 0.5 can affect sensitivity and specificity. Lowering threshold to increase sensitivity, but lowers specificity.

Metrics to assist with binary classification

What is an RoC curve?

The most commonly used way to visualize the performance of a binary classifier

Can help choose a threshold that balances sensitivity / specificity for your context.