Day 36-37: Concerned DALEX

by Danielle Navarro, 02 Jun 2018

I was working on a longer post continuing the metaprogramming series, and realised I wasn’t going to get it done this evening. But it’s been a couple of days since I tried out something new, so I resorted to the twitters to find inspiration. As always, the wonderful twitter rstats folks rose to the occasion:

Ooh. What is this DALEX package? I am curious. I hope it’s as kind and lovely as the concerned dalek:

TO THE VULNERABLE: CONCERNED DALEK LOVES AND OFFERS COMFORT TO YOU.TO THE VOICELESS: CONCERNED DALEK LOVES AND LISTENS TO YOU.TO THE TIRED, THE BELEAGUERED, THE DEPRESSED, THE MOURNING, THE ANXIOUS, THE FRIGHTENED: YOU ARE LOVED AND HAVE A PLACE HERE!

A very brief investigation!

I don’t have a lot of time this evening, but upon checking out the homepage, I discover that DALEX is short for Descriptive mAchine Learning EXplanations. This sounds very lovely, and I think Concerned Dalek would be very concerned to know that DALEX is working hard to help the humans understand what the machine learners are doing. Concerned Dalek would not want us to be worried about these things. All I have time for is a quick run through for two of the examples, but they are nice!

library("breakDown")
library("DALEX")

## Welcome to DALEX (version: 0.3.0).
## This is a plain DALEX. Use 'install_dependencies()' to get all required packages.

First we run a linear regression model predicting wine quality as a function of pH, sugar, sulphates and alcohol. Then we imagine seeing a new bottle of wine with known properties, and want to use this regression model (imaginatively named mod) to make a prediction about whether this new wine will be any good:

That’s nice but it isn’t immediately obvious why the model has made that prediction. To help with this, the explain function in DALEX lets me created an explainer object ex, and then use that to tell us something about the prediction

If I’ve understood this correctly, what the figure is showing me is what happens to the model prediction as I add the predictors in one by one. With just alcohol included, the prediction is just under 6.6. It goes up a little when pH is included, a little more when sulphates are added, and then goes down when residual sugar is considered.

It is kind of nice

For linear regression models, I could probably have done this myself with only slightly more effort, but most machine learning models are harder to interpret and I always feel very wary of trusting models whose behaviour is opaque. It’s helpful, then, that DALEX also lets you do this for things like random forest models:

Yay. I like this. At some stage I’d like to have the time to look into this properly, but for now that will have to do.

YOU COMMUNITY MAY NOT BE EASY TO FIND, BUT CONCERNED DALEK KNOWS THERE ARE PEOPLE OUT THERE LIKE YOU, WHO WILL SUPPORT AND LOVE YOU THROUGH YOUR CHALLENGES! SEEK THEM OUT! THEY NEED YOU! AND YOU NEED THEM!