Want to see the answers to previous questions? The pre-blog newsletter archives are here, “Best Article” reviews here.

PRIZM Clusters Not as Predictive as Behavior

Q: I am on an interesting project (and my first DB Mktg one): the client has a large loyalty program, and loves his PRIZM clusters. However, when I told him a little more about Recency and suggest that we spread all members across based on it, he was surprised to see that his PRIZM segments were not a predictive indicator at all!

A: Yes, and here is something many people don’t realize about PRIZM and other geo-demo programs, including census-driven. They were developed for site location – where should I put my Burger King, where should I put my mall? They are incredibly useful for this. However, think about all the sample size discussions for web analytics in the Yahoo Web Analytics Group related to A/B testing, and now imagine what your PRIZM cluster looks like.

In most cases, you are talking about 1 or maybe 2 records in a geo location – what is the likelihood these households reflect the overall “label” of the PRIZM cluster? Combine this with the fact that for customer analysis, demographics are generally descriptive or suggestive but not nearly as predictive as behavior and you have a bit of a mess.

Here’s a test for you. It only requires rough knowledge of your neighbors, so should not be very difficult (for most people!)

1. What is your “demographic”?
2. If you were to walk around the block and knock on doors, how many households would you find that are “in your demographic”?

Right. Maybe a handful, unless you live in a brand new housing development or other special situation. Now think about walking your zip code, or walking out 10 blocks or so from your house in any direction, and knocking on doors. Do you find most of these people are in the same demographic as you are? Did you ever find the “cluster average” neighbor?

We certainly know from web analytics that dealing with “averages” can be very dangerous indeed. So too with taking a demographic “average” of a zip or other area and tying it to a specific household. The model falls apart at the household level of granularity.

So now what to you think of all those websites and services that claim to know demographics based on a zip code they captured?

Now, if you think about an e-commerce database, with most records being one of a very few in a zip or cluster, you can see how the cluster demos would really break down at the household level.

Again, nothing wrong with using these geo-demo programs for what they were intended to be used for. When you are looking for a mall location or doing urban planning they can be very helpful. But the match rates at the individual household level are poor.

Couple this with the fact that e-commerce folks are usually looking for behavior from customers, and the fact demographics are not generally predictive of behavior by themselves, and you have yourself analytical stew.

Better than nothing? Absolutely, and for customer acquisition, sometimes all you can get. Best you can be? Not if you have the behavioral records of customers. In fact, what we often see is a skew in the demographics being called “predictive” when the underlying behaviorals are driving action.

In other words, let’s say a series of campaigns generates buyers with a particular demo skew. A high percentage of these Recent responders then respond to the next promotion. If you look just at the demos, you would see a trend and declare the demos are “predictive” of response, even though they are incidental to the underlying Recency behavior.

I suspect something like this was going on with your client. Not looking at behavior, over time the client becomes convinced that the PRIZM clusters are predictive, when for some reason they are simply coincident in a way with the greater power of the behavioral metrics. Given the client has behavioral data, that should be the first line of segmentation.

Q: After reading you for some years, I now understand how one must be very careful with psycho-demographics.

A: Well, at least one person is listening! And now you have seen how this works right before your very own eyes.

I think this situation is really a function of Marketers in general being “brought up” in the world of branding / customer acquisition. Most Marketers come up through the ranks “buying media” or some other marketing activity that focuses on demographics to describe the customer. And most of the college courses and reading material available focus on this function, so even the IT-oriented folks in online marketing end up learning that demographics are really important. And they can be, when you don’t know anything about your target.

Then the world flips upside down on you, and now people are looking at customer marketing, and that’s a whole different ballgame. The desired outcome is “action” that can be measured and the “individual” is the source of that outcome, as opposed to “impressions” and “audience”.

In the past, if your tried and true weapon of choice for targeting was demographics, that is what you reach for as you enter into the customer marketing battle. Problem is, it’s just not the best weapon for that particular marketing engagement.

11 thoughts on “PRIZM Clusters Not as Predictive as Behavior”

I completely agree with you that Prizm and/or any other neighborhood-based (demographic) system won’t give you the information you need to fuel meaningful analysis. As a long-time database marketer with very specific experience in understanding available data, I will say that in my experience Prizm and similar offerings are not used very widely anymore, especially if they’re used alone (without complementing with other types of data). Demos alone definitely cannot predict an outcome, I believe.

Thanks for the comments folks! I did get a couple of e-mail nastigrams from vendors of this type of service telling me how I was all wrong, but they missed the point, outlined well by the folks at Diamond and Suzanne.

I don’t have anything against geo-demographics, they simply are what they are. But after they gained momentum in the early 90’s there was an effort to sell them for every application as a “magic bullet” and they simply do not work well for certain applications. I know this idea is incredibly attractive to the many IT-centric marketers on the web, so thought I would provide some background.

I’m still leary of using geo-dems as a “match back” to provide color for a customer database due to the fragmentation issue – the penetration of customer accounts in any one geographic location is usually so small the margin of error could be quite large. The “You will need to augment it with good market research” comment by Diamond addresses this issue.

You can see it in the numbers too. I find that it’s rare to be able to build a model on non-behavioral data (even when it includes geocodings) that has a gini coefficient higher than about 40%, whereas models built on behavioral data frequently get gini values of 70-85%, especially in credit risk.

Our genes, our location and so on all tell part of the story — maybe more than we’d like — but it remains the case that you get a much better understanding by looking at what people do, than at who they are.

Thanks for the comment Nick. It’s a pretty simple idea (behavior predicts behavior better than demographics) but I think most marketing folks are so used to relying on demographics – often because they have nothing else – that it’s just assumed the demos are drivers. But what people do and “who they are” from a demo perspective can be widely divergent.

Demos can be useful to classify groups of folks when no behavioral data is present, as in many offline media buying situations. But if the end goal is a behavior of some kind rather than an “impression” against a nameless, faceless demo, behavior is what you need to model.

I would toss survey data (what they say) into the demo bucket as well as far as being predictive of behavior, but this post has already ruffled a lot of feathers. Let’s just say post survey tracking of “what I said I would do” versus “what I actually did” is critical. I have seen enough inverse correlation between the two to be very wary of acting on survey data alone. Part of the puzzle, not the answer, often the wrong answer.

I’m not sure if I disagree or not, but let’s not be the ones to make a case on bad data. Most PRIZM clusters are defined based on household level information not anonymous zip groupings. There are several ways to match databases to PRIZM clusters and doing so at a zip level is ignorance on the part of the marketer. You can identify down to the HH level what PRIZM cluster an individual belongs to.

Like I said, I’m not sure I disagree – but at least we should get the facts right.

In the larger sense if you know that I’ve been to automotive sites that will be much more predictive that I’m in the market for a vehicle than the fact that I’m in a PRIZM segment that buys new vehicles, often. However, what you may not know is what I may be most interested in new, used? SUVs et al. That is where PRIZM can enhance behavioral data. Then again you may know I’ve been to Chevy looking at Silverados and that would be a pretty good predictor that I’m looking for a new truck.

PRIZM is not a replacement for actual behavior but sure is useful as another measure—even more so when you don’t know anything else or have some many behavioral items that you need some means to aggregate. Think about it—what does someone do with thousands of click stream points? How do brand managers align that with their overall marketing, branding “and promotion strategies?

“Better than nothing? Absolutely, and for customer acquisition, sometimes all you can get. Best you can be? Not if you have the behavioral records of customers.”

What I’m addressing here is the tendency of some Marketing folks to use PRIZM clusters in an inappropriate way, that is, to take a geographic profile (say zip-level) and apply it to a household.

If you are marketing to the entire zip, this makes sense in terms of a model. But if the classification was based on a geo-set and you are now marketing to a single household (a customer or two), the classification is bound to be off.

In other words, it matters where the cluster came from. If it was created at the household level, then it will be accurate, and if that is useful, fine. If the cluster was created at some level of geography and then applied to the houshold, chances are it won’t be accurate.

Either way, if you have behavioral data, and behavior is what you seek, whatever model you are constructing should be based on this data before even considering PRIZM.

Yes, agreed. As Prizm is HH level model its really dependent on what data an internet marketer has at their disposal to use. Any PII with address info can be matched to a Zip+6 level assignment, PII stripped and the segment then associated with the cookie for linkage. The biggest issue is to your point; internet marketers using higher level geo append and applying to a HH. Example is using IP to Zip appends which mostly append the ISP hub zip and then using this to append segments, demos, etc. based on that Zip. This is a totally inaccruate way of targeting and misleading based on the IP to Zip append process.