Survival Analysis

I just posted a simple implementation of WTTE-RNNs in Keras on GitHub: Keras Weibull Time-to-event Recurrent Neural Networks. I'll let you read up on the details in the linked information, but suffice it to say that this is a specific type of neural net that handles time-to-event prediction in a super intuitive way. If you're thinking of building a model to predict (rather than understand) churn, I'd definitely consider giving this a shot. And with Keras, implementing the model is pretty darn easy.

As proof of the model's effectiveness, here's my demo model (with absolutely no optimization) predicting the remaining useful life of jet engines. It's not perfect by any means, but it's definitely giving a pass to engines that are in the clear, and flagging ones that are more likely to fail (plus a few false positives). I have no idea how much better it could do with some tweaking:

Also, if anybody's curious as to why I've been in a bit of a post desert lately, my wife and I recently had a baby and I haven't been giving as much thought to the blog. However, I have some ideas brewing!

Every survival analysis method I've talked about so far in this series has had one thing in common: we've only looked at one event in a customer lifetime (churn). In many cases, that's a perfectly fine way to go about things... we want our customers to stick with us, so churn is the event of interest. So why would we ever need to think about competing risks?

You know, competing risks. Will you die by tornado, or by shark?

There's actually a critical assumption undergirding most survival analysis methods for right-censored data - that censored individuals have the same likelihood of experiencing the event of interest as individuals that never got censored. If this assumption ever gets violated, things like Kaplan-Meier estimators can become wildly inaccurate. (If you need a refresher on Kaplan-Meier curves and other concepts, take a look at my earlier post on basic survival analysis.)

If you've ever done churn analysis using cox regression with time-dependent covariates, you know that the hardest part of doing that type of research is building your base data set. You have to divide each customer's lifetime into "chunks" where the changing values of a host of different predictor variables apply. I've coded this in SQL before, and it gets ugly. Fast.

Merge, everybody, merge!

Thankfully, R has a solution for this that makes life significantly more simple. It's the "tmerge" function, and it's in the survival package. Today, we'll walk through how to use it.

There are numerous strategies for dealing with non-proportional hazards in cox regression analysis. You can stratify your data based on a problematic variable. You can chuck the cox model and create "pseudo-observations" to analyze the gains (or losses) in lifetime within a certain period associated with changes in a variable. If age is a problem (unlikely for customer churn, but it happens a lot in medical contexts), you can use age rather than time in the cohort as your time scale. The list goes on.

But this is statistics! We're supposed to be modeling things!

Statistics. Almost as cool as Sparta.

Well, as it turns out, it's actually possible to directly model how the effects of a variable change with time, so that you can not only handle the proportional hazards problem, but also get a reliable estimate of how hazard ratios for a given variable change with time. The way to do this is actually incredibly simple... we introduce an interaction term between the variable of interest and time. Let's get started!

My previous series of guides on survival analysis and customer churn has become by far the most popular content on this blog, so I'm coming back around to introduce some more advanced techniques...

When you're using cox regression to model customer churn, you're often interested in the effects of variables that change throughout a customer's lifetime. For instance, you might be interested in knowing how many times that customer has contacted support, how many times they've logged in during the last 30 days, or what web browser(s) they use. If you have, say, 3 years of historical customer data and you set up a cox regression on that data using covariate values that are applicable to customers right now, you'll essentially be regressing customer's churn hazards from months or years ago on their current characteristics. Your model will be allowing the future to predict the past. Not terribly defensible.

In the classic double-slit experiment, past events are seemingly affected by current conditions. But unless you're a quantum physicist or Marty McFly, you're probably not going to see causality working this way.

In this post, we'll walk through how to set up a cox regression using "time-dependent covariates," which will allow us to model historical hazard rates on variables whose values were applicable at the time.

My company has a subscription-based business model, which means we spend a lot of time analyzing customer churn. We wanted to include Kaplan-Meier survival curves in some of our executive dashboards, but neither our database (Redshift) nor any of our commonly used dashboarding tools (Tableau, Periscope, etc.) provided the necessary functionality. We could, of course, have pulled data out of the warehouse, analyzed it in R or Python, and pushed it back up, but that's pretty complicated. So we went looking for a better solution.

Based on all available evidence, the survival curve for aliens singing disco music hits 0 at about 43 seconds (N = 1).

As you likely guessed from the title of this post, that better solution involved writing our own code for calculating a Kaplan-Meier estimator in SQL. In this post, I'll be walking through our strategy step-by-step, including the SQL code for calculating the estimators and making that code reusable in Periscope. Let's do this!

In a past post on analyzing churn in the subscription or Software as a Service business, I talked about two different ways to quantify the dollar cost of churn. You could use 1 / churn as an estimation of mean customer lifetime (though this simple method makes a lot of assumptions). Or, you could use “pseudo-observations” to calculate the dollar value of certain groups of customers during a particular time period (which doesn’t let you quantify the full lifetime value of a customer).

But what if there was another way? What if we took our Kaplan-Meier best estimate of our churn curve, fit a linear model to that model, and then projected it out?

A model within a model, if you will. Churnception.

Well, as it turns out, we’d get a reasonable estimation of our lifetime churn curve, which would let us estimate average customer lifetime, and customer lifetime value. Let’s get started.

Imagine for a moment that you’ve pulled together the mother of all churn data sets. You’ve got customer lifetime data, demographic data, and usage information. You know how many support tickets your customers have submitted, what those tickets were about, and whether they were happy with the customer service they received. You know what some of your customers had for breakfast this morning. OK, maybe not the breakfast thing. But it’s a lot of data.

Excited about your work, you pop all of this into a cox regression model, and the proportional hazards test blows up. Majorly. You take most of the variables out of the model, and you’re left with some analysis that doesn’t violate any key assumptions, but that also doesn’t tell you much of anything. What do you do?

You, sir, are stuck! Or are you?

One of the easiest ways to tackle these challenges is to create “pseudo-observations” from your survival data. These pseudo-observations can be plugged into regular statistical models that don’t have a proportional hazards assumption. It’s a great way out of a tight spot.

Sometimes, taking action to reduce customer churn costs money. In those instances, it can be helpful to know how much revenue churn is costing you… and how much of it you could recapture. Lucky for us, there’s a stat for that! It’s called “Restricted Mean Survival Time,” and it allows us to easily quantify the monetary impact of changes in customer churn. Let’s think about putting it to use!

Last week, we discussed using Kaplan-Meier estimators, survival curves, and the log-rank test to start analyzing customer churn data. We plotted survival curves for a customer base, then bifurcated them by gender, and confirmed that the difference between the gender curves was statistically significant. Of course, these methods can only get us so far… What if you want to use multiple variables to predict churn? Will you create 5, 12, 80 survival curves and try to spot differences between them? Or, what if you want to predict churn based on a continuous variable like age? Surely you wouldn’t want to create a curve for each individual age. That’s preposterous!

Don’t do this. This isn’t useful analysis. This is an etch-a-sketch gone horribly wrong.

Luckily, statisticians (once again, primarily in the medical and engineering fields) are way ahead of us here. A technique called cox regression lets us do everything we just mentioned in a statistically accurate and user-friendly fashion. In fact, because the technique is powerful, rigorous, and easy to interpret, cox regression has largely become the “gold standard” for statistical survival analysis. Sound cool? Let’s get started.