2 December 2010

A couple of weeks ago I shared some early results from our calling program that showed how very high-scoring alumni (for propensity to give by phone) can be counted on to give, and give generously, even after multiple attempts to reach them. If they have a high score, keep calling them! Yes, contact rates will decline, for sure. But these prospects are still likely to give if you can get them on the phone, making the extra effort worthwhile.

For the other three-quarters of your prospects, it’s a different story. You may still want to call them, but keeping those phones ringing all year long is not going to pay off, even if you have the luxury of being able to do so.

This is ground I’ve already covered, but I think it bears repeating, and I’ve created some charts that illustrate the point in a different way. Have a look at this chart, which shows pledge percentage rates for the 6th, 7th, 8th, 9th and 10th decile score, at four stages of call attempts:

This chart is based on data from more than 6,600 phone conversations. How are we to interpret it? Let’s start with the top line, in blue, which represents prospects in the top 10% (decile) of alumni for propensity to give by phone, as determined by the predictive model:

Almost 38% of 10th-decile alumni who were contacted on the very first call attempt made a pledge.

Moving to the next dot on the blue line, we see that almost 37% of the 10th-decile alumni who were contacted on the 2nd or 3rd attempt made a pledge.

The pledge rate slips a little more, to 36%, if the prospect picked up the phone on attempts 4 through 7.

And finally, almost 26% of them pledged if it took more than 7 attempts to reach them.

That’s the first line. The other lines take different paths. The 9s and 8s start much lower than the 10s, but pledge percentages actually rise with the number of call attempts. They will fall back to earth — just not yet! As for the lower deciles, the 7s and 6s, they start relatively low and dwindle to zero.

So what does all this tell me? I am less interested in how each decile ranks at the start of calling (one or two attempts), because it’s no surprise to me that the 10th decile gives at twice the rate as the 9th decile, and that pledge rates fall with each step down in the score. I’ve seen that before.

What really interests me is what happens when we’ve made many repeated attempts to call. That the 8s and 9s have pledge rates that increase with the number of call attempts is pretty strange stuff, but the fact is: 26 alumni with a score of 9 made a pledge only after we called them 8 or 9 or maybe 15 times.

Whether it’s worth it to make that many call attempts is up to you. It depends on contact rates, and what it costs to make all those calls. But one thing is certain: If I’m going to call repeatedly, I’d better be calling the top three deciles, because if I keep on flogging the segments with a score of 6, I’m not going to do very well.

So what about contact rates?

Here’s another chart that shows what percentage of each score decile’s prospects have been successfully reached at the same four levels of call attempts. (Click on chart for full size.)

What does it mean? Compare the lowest decile called so far (Decile 6) with the highest decile (10). About 14% of 6s answered the phone on the first try, compared with about 19% of the 10s. That’s not a big difference: In fact, contact rates are similar across the scores for the first attempt. But the similarity ends there. After the first attempt, the lower scoring alumni have steadily decreasing rates of contact. The same is true of the higher-scoring alumni, but the difference is that some of them are still answering their phones on the 8th call. More than 4% of 10s were reached on the 8th call or greater.

The bottom line is, the propensity score is your biggest asset in setting appropriate call attempt limits. Yes, Renewal prospects are more likely to give than Acquisition prospects. But that’s not enough to go by. Are you going to call every last Renewal prospect before thinking about acquiring new donors? I wouldn’t recommend it — not if you care about long-term growth and not just this year’s totals. And because contact rates decline as attempts increase (regardless of score), you’re going to end up making a LOT of phone calls to find those gifts that will make up your goal.

My earlier post on the same subject is here. I am spending a lot of time on this, because I don’t see any of this being written about by the well-known experts in Phonathon fundraising. Why that is, I do not know.

18 October 2010

In planning which predictive models I was going to create for this fall’s Annual Giving appeals, I had one main idea in mind: Donors are not equally receptive to both mail and phone solicitation. I knew from previous experience that I could build a good model trained on all Annual Fund giving regardless of source, but that it would not be optimal because it would fail to take “preferred mode of solicitation” into account. Great donors with high propensity-to-give scores will hang up on your Phonathon callers if their preferred mode of giving is by mail! (See Preventing hangups and rudeness in your Phonathon program.)

The question is: Is it realistic to think that you can predict which mode is the preferred? If you can’t, why bother trying? Let’s explore that question.

I created two separate models, to answer separate questions:

The Mail Model answers the question, “Who is most likely to give in response to a mailed solicitation?” The outcome variable was defined as each individual’s total giving in response to mail solicitation.

The Phone Model answers the question, “Who is most likely to give in response to phone solicitation?” The outcome variable was defined as each individual’s total giving in response to phone solicitation.

The question is: Is this really worth the bother? It’s difficult to say what, if anything, separates mail-receptive donors from phone-receptive donors, although it’s always been my sense that people who prefer mail over phone tend to be older. A look at my data shows that there is indeed an age difference:

Alumni who have given by phone but not by mail have a median age of 47.

Alumni who have given by mail but not by phone have a median age of 55.

Alumni who have given via both channels are closer in median age to the by-mail-only group, at 56.

The giving data is potentially biased: Perhaps young alumni tended to be solicited by phone rather than by mail, therefore that’s how they’ve given. However, it’s my understanding that in this case, no special consideration was given in the past to emphasizing one mode of solicitation over another based on any criteria such as age. Therefore, I’ve assumed that the age difference at least is real.

But what besides age could possibly differentiate between the two? Nothing intuitive comes to mind. So it’s fair to ask whether the two models are different enough from each other to warrant the extra work. I can think of only one way to test it: Compare how each individual is scored in both models. If most people hold the same rank from one to the other, then both models essentially predict the same thing, i.e. giving of any sort.

In both models, the same 87,000 alumni were ranked in deciles according to their score, with Decile 10 being the highest possible ranking. The table below shows the results when I subtract each alum’s Phone Model decile from their Mail Model decile. When I compared Phone scores with Mail scores, I found that a little more than 20% of alumni had the same decile score in both models (the zero bar, in the centre):

Another 27% of alumni have scores that differ by one decile (plus 1 or minus 1). Because it’s easy for an individual to get bumped from one rank to another, I will discount that difference as insignificant. Added to the alumni who didn’t move at all, a total of 48% of alumni have basically the same score from model to model.

That leaves a slight majority (52%) who differ significantly from model to model. About one-quarter of alumni are predicted to have a relative preference for mail, and one quarter have a relative preference for phone. Curiously, these two quarters do NOT differ significantly in their median age — so how the two groups DO differ from each other is still rather mysterious.

This exercise does not prove that two models are better than one — but I think it DOES show that they are predicting different things. I will have to be content with that for now. If it sounds as though I’m not certain about the benefits of modeling for channel preference, well — I’m not. I think more work needs to be done. As usual, your thoughts are welcome.

26 August 2010

At the APRA Conference in Anaheim a month ago, a session attendee was troubled by something he saw during a presentation given by David Robertson of Syracuse. The attendee was focused on the “constant” value in David’s example of a multiple linear regression model for propensity to give. This constant, which I will talk about shortly, was some significant figure, say “50”. Because the Y value (i.e., the dependent variable, or outcome variable, or predicted value) was expressed in dollars (of giving), then this seemed to indicate that the “floor” for giving, the minimum value someone could be predicted to give, was $50.

How do you figure that?, this attendee wanted to know. It’s a reasonable question, for which I will try to provide my own answer. (More knowledgeable stats people may wish to weigh in; it would be appreciated.) There are implications here for how we interpret the predicted value of “Y”.

When you do a regression analysis, your software will automatically calculate this “constant,” which is simply the first term (“a”) in the regression equation:

In other words, if all your predictor variables (X’s) calculate out to zero, then Y will equal ‘a’. The part of this that the attendee found hard to swallow was that the minimum possible amount an alum could donate, as predicted by the model, was something greater than zero dollars. It seemed nonsensical.

Well, yes and no. First of all, the constant is no such thing. If you were to plot a regression line, that straight line has to cross the Y axis somewhere. The value of Y when the sum of X’s is zero is that crossing point (a.k.a. the Y-intercept). But that doesn’t mean it’s the minimum. Y does equal zero at a point: When the sum of predictors is negative — that is, when the regression line passes to the left of the Y axis and down, and crosses the X axis (the X-intercept).

So, really, you’re not learning much by looking at the constant. It’s a mathematical necessity — it describes an important aspect of what any line looks like when plotted — but that’s all. While the constant is always present in our regression analysis for predictive modeling, we tend to ignore it.

But all this is leading me to an even more fundamental question, the one posed in the headline: What is Y?

In David’s example, the one that so perplexed my fellow conference attendee, Y was expressed in real dollars. This is valid modeling practice. However, I have never looked at Y in real units (i.e., dollars), due to difficulty in interpreting the result. For example, the output of multiple linear regression can be negative: Does that mean the prospect is going to take money from us? As well, when we work with a transformed version of the DV (such as the natural log, which is very common), the output will need to be transformed back in order to make sense.

I sidestep issues of interpretation by simply assuming that the predicted value is meaningless in itself. What I am primarily interested in is relative probability, and where a value ranks in comparison with the values predicted for other individuals in the sample. In other words, is a prospect in the top 10% of alumni? Or the top 0.5%? Or the bottom 20%? The closer an individual is to the top of the heap, the more likely he or she is to give, and at higher levels.

I rank everyone in the sample by their predicted values, and then chop the sample up into deciles and percentiles. Percentiles, I am careful to explain, are not the same thing as probabilities: Someone in the 99th percentile is not 99% likely to make a gift. They might be 60% likely — it depends. The important thing is that someone in the 98th percentile will be slightly less likely to give, and someone in the 50th percentile will be MUCH less likely to give.

This highlights an important difference between multiple linear regression, which I’m talking about here, and binary logistic regression. The output of the latter form of regression is “probability”; very useful, and not so difficult to interpret. Not so with multiple linear regression — the output in this case is something different, which we may interpret in various ways but which will not directly give us values for probability.

Fortunately, fundraisers are already very familiar with the idea of ranking prospects in descending order by likelihood (or capacity, or inclination, or preferably some intelligent combination of these). Most people can readily understand what a percentile score means. For us data modelers, though, getting from “raw Y” to a neat score takes a little extra work.

9 March 2010

Whether you’ve got an alumni database or strictly a donor database, as a fundraiser you face the same unknown: Which donors in your database are most likely to give at higher levels? Who should you focus your time and attention on? Who is ready to be asked to give more? Maybe much more?

Unless you’re really digging into your database, the answer will remain shrouded in darkness. For example, if your primary means of segmenting your phonathon prospects is still by giving history (eg. LYBUNTs, SYBUNTs, whatever-BUNTs), then I’m telling you that you’re not learning anything you don’t already know.

Predictive analytics is all about revealing patterns and relationships that we could not otherwise have known, and a predictive model is the only tool capable of distinguishing between the $20-a-year donor who might be ready to give $100, and the $20-a-year donor who isn’t. Why? Because a predictive model is based on variables that are wholly or partially independent of giving, but correlated with giving.

Last year I created a predictive model for our Phonathon program, which ranked every one of our living alumni by their propensity-to-give score, and sorted them into deciles. The idea was, the top decile (i.e., the top 10%) of scores would be expected to have highest rates of participation, and highest average pledges. As our calling season winds down, I can confirm that both are overwhelmingly true. (This outcome has long since ceased to be a surprise, I should add.)

But here’s something: Can our Phonathon model predict which donors are ready to boost their pledges?

Recently I pulled some data to identify which alumni had given in the Phonathon program in 2008-09, and then went on to make a bigger pledge this year. The chart below shows how they break down by score decile:

Donors with a score from 1 to 7 account for about 19% of “increasers.”

Donors who score an 8 or 9 account for another 30%.

Donors with a score of 10 account for more than HALF of all increasers.

So there’s yet another reason, if you needed it, for getting focused on the high-scorers in your program: They’re the ones most likely to increase their pledge year over year.

The implications are exciting. Could this knowledge not inform the way you construct formulas for target asks? I think so. At least you would have a sense of the relative probability that a increased ask will be successful. (For donors with lower scores, you might want to just maintain them at the giving level they’re at.)

I haven’t done this, but it would not be hard to build a model that specifically predicts propensity to increase. This would be especially valuable for non-university nonprofits whose databases are made up exclusively of donors. I can think of a number of ways to define the dependent variable for such a model, but a good start would be to go back a few years in order to identify a critical mass of donors who have exhibited the behaviour, and train the model on those.

In an earlier post I talked about what you need to get started to build an ‘event attendance likelihood’ model. Today I want to provide some evidence to back up my claim that yes, you can identify which segment of your alumni population is most likely to attend your future event.

To recap: Every living, addressable alumnus/na in our database is scored according to how likely he or she is to attend an event, whether it be a President’s Reception or Homecoming, whether they’ve ever attended an event or not.

The scores can be used to answer these types of questions:

What’s the top 30% of alumni living in Toronto who should be mailed a paper invite to the President’s Reception?

Who are the 50 members of the Class of 2005 who are most likely to come to Homecoming for their 5th-year reunion?

I built our first event-attendance model last summer. As I always do, I divided all our alumni into deciles by the predicted values that are produced by the regression analysis (the ‘raw score’). The result is that all alumni were ranked from a high score of 10 (most likely to attend an event) to 1 (least likely).

At that time, alumni were sending in their RSVPs for that fall’s Homecoming event. Because I use only actual-attendance data in my models, these RSVPs were not used as a source of data. … That made Homecoming 2009 an excellent test of the predictive strength of the new model.

Have a look at this chart, which shows how much each decile score contributed to total attendance for Homecoming 2009. The horizontal axis is Decile Score, and the vertical axis is Percentage of Attendees. Almost 45% of all alumni attendees had a score of 10 (the bar highlighted in red).

(A little over 4% of alumni attendees had no score. Most of these would have been classified as ‘lost’ when the model was created, and therefore were excluded at that time. In the chart, they are given a score of zero.)

To put it another way, almost three-quarters of all alumni attendees have a score of 8 or higher. But those 10 scores are the ones who really stand out.

Let me anticipate an objection you might have: Those high-scoring alumni are just the folks who have shown up for events in the past. You might say that the model is just predicting that past attendees are going to attend again.

Not quite. In fact, a sizable percentage of the 10-scores who attended Homecoming had never attended an event before: 23.1%.

The chart below shows the number of events previously attended by the 10-scored alumni who were at Homecoming in 2009. The newbies are highlighted in red.

The majority of high-scoring attendees had indeed attended previous events (a handful had attended 10 or more!). But that one-quarter hadn’t – and were still identified as extremely likely to attend in future.

That’s what predictive modeling excels at: Zeroing in on the characteristics of individuals who have exhibited a desired behaviour, and flagging other individuals from the otherwise undifferentiated masses who share those characteristics.

Think of any ‘desired behaviour’ (giving to the annual fund, giving at a higher level than before, attending events, getting involved as an alumni volunteer), then ensure you’ve got the historical behavioural data to build your model on. Then start building.

15 December 2009

The predictive modeling method I use (multiple regression) results in a “raw score” that is great for very fine ranking, because it will probably produce almost as many score levels as there are individuals in your sample. But it doesn’t work at all for other purposes.

For example, you can’t use ‘raw score’ to observe how a person’s or a group’s propensity to give changes from year to year. Your model changes over time, and so will the output. What does it mean if Joe’s raw score goes from 6349 to 9032? Not much. The value of the score itself has no practical meaning.

Because the values are not easy to explain to end-users, and because they change so much from year to year, you need to provide a more intuitive scoring system.

If you take everyone in the sample and divide them up into groups of roughly equal numbers, by their raw score, you produce a much more useful ranking.

Equal quarters are quartiles, equal fifths are quintiles, and so on. For our needs, equal tenths (deciles) and equal hundredths (percentiles) are the most useful.

For example, if Joe goes from the 60th percentile last year to the 93rd percentile this year, that’s a meaningful change.

But usually we’re not as interested in the score of a single individual as we are in getting a handle on a whole segment of a population. If your annual giving coordinator knows that the 9th and 10th deciles are always where the money is, regardless of how your model changes in any given year, you’ve taken a big step towards clarity. If your results aren’t clear, no one will embrace them.

Which type of predictive score you would use, deciles or percentiles, depends on how selective you need to be:

An Annual Giving manager trying to prioritize groups for the Telethon campaign might focus on the top one, two or three deciles. That represents thousands of alumni whose raw propensity-to-give scores place them in the top 10% to 30% of the population.

A Planned Giving Officer trying to zero in on the best prospects might focus on no more than the top 1-5% of the population. For that person, percentiles will provide a much sharper knife.

When I produce a set of scores, I usually provide all three types, because one can’t always anticipate needs. The screen in Banner that holds the scores (APAEXRS) is able to accommodate three ‘flavours’ of scores, so I usually upload raw, decile, and percentile scores for each model.

Unfortunately, the output scores of a regression model are messy! They have to be worked on a bit in order to whip them into shape before you upload them to the database. Here’s what they often look like in their ‘unprocessed’ state:

The Banner field I upload scores to is able to contain four digits. So for ‘raw score’, I create a derived variable in Data Desk that multiplies these values by a thousand and rounds to the nearest whole number.

Deciles and percentiles are not exactly available at the push of a button, either. In future posts I will describe the methods I’ve been taught to produce a nice, clean set of scores for upload.