Engaging respondents with interesting measurement tasks.
Involving clients with visualizations of actionable findings.
Challenging marketing research to provide a theoretical basis for its measurement procedures.

Pages

Monday, August 20, 2012

Halo Effects and Multicollinearity: Separating the General from the Specific

In the last post, The Relative Importance of Predictors, I showed how difficult it can be to assess the independent contribution that each predictor makes to the overall R-squared when the predictors are highly correlated. We spent some time looking at one example where the predictors were ratings from an airline satisfaction study. As is common in such studies, all the pairwise correlations tended to be sizable and suggested the presence of a strong first principal component, what some might call a halo effect.

Perception is Reality

Unfortunately, the term "halo effect" has too often been associated with measurement bias. Of course, it is a bias in the sense that perceptions do not reflect actual behavior, as was noted first by Thorndike in l920. But it does reflect how customers truly feel about the brands they buy and use. Human perceptions are more consistent than behavior, whether it be the person perceptions or brand perceptions. As they say, perception is reality.

In his book Thinking, Fast and Slow Daniel Kahneman argues that there are two systems underlying human thinking: a relatively fast, intuitive, and associative System 1 and a slower, more deliberative, and effortful System 2. Then he uses the interplay between these two systems to explain the heuristics and biases that have been uncovered in cognitive psychology (e.g., framing, anchoring, and substitution) and behavioral economics (e.g., prospect theory).

"Exaggerated emotional coherence" is the term Kahneman uses for the halo effect. It is part of every customer satisfaction rating. It is not simply "measurement bias" because it impacts not only the ratings but the actual purchase behavior of real customer in the marketplace. Loyal customers, for example, may well be "biased" in that they perceive that their brand delivers more consistent value than it actual does. But that "bias" produces revenue for the company and is encouraged at every touch point.

Structural Ambiguity

We are taught that rating items should be written to measure one and only one thing. This is how we avoid ambiguity. For example, we ought to be able to write an item that measures only whether an airline's ticket prices are seen as reasonable and nothing else. But price perception is more complex than looking at ticket prices from different airlines. True, there is a price-sensitive segment going online to flight booking sites and selecting the airline with the lowest price ticket. But what about frequent fliers who go directly to the airline website and so love their reward programs that they discount pricing differences and overlook defects. We cannot assume that respondents know much about the pricing practices of competitive airlines that they never fly. Nor can we ignore the effects of cognitive dissonance that make our memories more positive than our experiences.

In the end, we do not have single ratings but a complex associative network of brand beliefs that serve brand purchase and usage, not reality. Perhaps if we looked beyond the individual item to the response patterns across multiple items, we could discover a statistical technique that separates generalized impression from the more specific features of products and services.

If you were to run a web search, please enter the term "bifactor model." You will find references to confirmatory factor analyses (we can use the R package lavaan), to factor rotations (GPArotation), to exploratory factor analysis (psych), and to multidimensional item response theory (mirt). Historically, the name "bifactor" was introduced to distinguish the technique from those seeking a simple structure. A factor structure is simple when every item loads on one and only one factor.

However, I prefer the name "multifaceted" because even seemingly unidimensional constructs may be composed of several highly interconnected, but conceptual distinct, facets. What is customer loyalty? It is satisfaction and recommendation and retention and willingness to pay more and so on. Often, we will see satisfaction, retention, and recommendation combined into a single loyalty measure. But, as we saw in my previous post, Network Visualization of Key Driver Analysis, these three measures have different drivers. Hence, customer loyalty is a multifaceted construct - a set of interpretable subscales and a total score with a different meaning. [If you want to read more about the underlying psychometric theory, check out the writing of Steven P. Reise. "Bifactor Models and Rotations" is a good place to start.]

We will restrict ourselves for now to exploratory factor analyses of the airline satisfaction data. The next post will outline how to run and interpret the confirmatory bifactor model.

The figure below shows a traditional principal axis factor analysis with oblique rotation. The R code using the package psych is shown in the appendix at the end of this post. The boxes are observed ratings, and the circles are latent variables. Each ratings loads on only one factor (simple structure), as indicated by the lack of multiple arrows to any of the boxes. And the factors are correlated as shown by the arcs between the circles. Thus, PA1 is our service factor, and the correlations among the first four boxes are due to the service factor. However, there is also a sizable, but not as large, correlation between Courtesy and Overhead Storage. We can see this by following the path from PA1 to Courtesy (0.71) and the path from Overhead Storage to PA3 (0.88) and the path between PA1 and PA3 (0.8).

One should remember that a factor model is a hypothetical structure with factor loadings and factor correlations estimated in order to reproduce the observed correlation matrix. If the only sizable correlations were among ratings loading on the same factor, the factors would not be correlated. But as we showed in the previous post, all the ratings are correlated with even the smallest pairwise correlation above 0.40. Thus, in order to maintain a simple structure with each rating loading on only one factor, the factors must be correlated. This is the price we pay for simple structure.

Why are the factors correlated? Well, if the items are correlated because they all measure the same underlying latent variable, then the factors must be correlated because they all measure the same higher-order underlying construct. And thus, we have a hierarchical factor model, which looks like the following diagram.

The circle with "g" is the higher-order factor responsible for the lower-order factor correlations. Originally the letter "g" was selected to stand for general intelligence. The oblique and hierarchical factor models are essentially the same, but now the correlations among the first-order factors are due to g rather than being unexplained correlations. Unfortunately, g can be difficult to explain in the hierarchical model because it is restricted to impact the items only through the low-order factors. It might make more sense to have g directly impact the item, but then we would have a bifactor model.

So we are ready for our multifaceted bifactor model. First, we abandon simple structure. Our observed ratings are multifaceted. We want them to load on more than one factor. In return we get orthogonal factors. Here is the diagram.

We have generalized impression g contributing to variation in all the observed ratings, more so for Helpfulness than Flight Options. This is a common pattern that I have found repeatedly in customer satisfaction data. The loading for g is an inverse function of the specificity of the rating. We see that here to some extent with g decreasing as we move from Helpfulness and Service to Flight Options and Ticket Prices, although all the items have substantial loadings on g.

In addition to g, we see the more specific feature factors that we found earlier in the oblique and hierarchical models. All the ratings are intercorrelated to some degree because they tap a generalized impression. But, in addition to this baseline correlation, some ratings are even more highly interrelated because they also measure one of the three more specific feature factors: aircraft, service, or ticketing.

Finally, there are no arrows or arcs connecting any of the circles because all the latent variables are independent. If the bifactor model "fits" the data, then our estimated loadings shown in the above diagram will reproduce the correlation matrix among the ratings without any covariation among the factors.

In this particular example, all the items have higher loadings on g than they have on their specific feature factors. This is not unexpected given that all the ratings were highly correlated and the first principal component accounted for 62% of the total variation. Of course, this will not always be the case. In general, when respondents can provide a rating by making a simple inference or association from their generalized impression, they will do so because it is easier than making the effort to retrieve specific memories and then taking the time to combine those memories into a response. [see Norbert Schwarz Cognitive Aspects of Survey Methodology for a review]

Conclusions

I have suggested that the analysis of customer satisfaction data ought to begin with the realization that satisfaction is a multifaceted construct. I am not speaking of an overall satisfaction rating, such as, "Overall, how satisfied are you with your ____?" This is an ambiguous question, which is why you get a different response if you ask it before or after a battery of more detailed satisfaction questions. When I speak of generalized impression, I am referring to the general factor that emerges from a factoring of all the rating items in a detailed battery of specific product and service features.It is the pattern of responses to all the rating items that will tell you something about both the generalized impression and the specific feature components. Generalized impression is not obtained by asking the respondent to make an inference or generate a summary judgment. Instead, it is derived from the correlations among all the satisfaction ratings across the breadth of the customer's interaction with the product or service.

On the one hand, customer satisfaction has well-defined components. That is, the product or service can be decomposed into its parts or factors, and each part has its own distinguishable subscale score. On the other hand, customer satisfaction is a separate construct, not simply a summary measure, but its own entity. In the marketing literature, we tend to speak of this view of customer satisfaction as brand attachment or brand relationship. I have argued that the "halo effect" or "exaggerated emotional coherence" is not measurement bias, but a real entity that can be thought of as an orienting response, a predisposition toward avoidance or approach, or an initial affective response. It deserves its own score. In the bifactor model that score is g.

To be clear, I am not arguing that measurement bias does not exist and does not impact the way that a scale might be used. There are cultural and individual differences in scale usage. Scale usage heterogeneity is real, but it cannot explain g in the bifactor model. Halo effects are ubiquitous and robust. You find them with ratings, with rankings, with selection of best or top three, with behaviorally anchored scales, with ordered scales, with categorical scale, with scales created using item response theory, and hopefully you get the point. Method variation is not so robust an effect.

In contrast, method variation is what you observe when you try to measure the same trait using different methods and fail to find consistency (e.g., multitrait-multimethod analysis). Method variation is responsible for the fact that an overall satisfaction rating is more highly correlated with a battery of more specific satisfaction ratings when overall satisfaction is asked after the battery than when it is asked before the battery. Method variation is the factor structure that you see when you group your items together in a questionnaire and ask them all at the same time.

Order effects, however, are as strong as halo effects. But order effects are not method variation either. They are also due to the interplay of System 1 and 2 thinking (e.g., priming or context effects). And, like the halo effect, order effects impact marketplace decisions. They are not measurement artifacts, but the real thing that companies use to make money by discounting and controlling the retail shelf. No matter how hard I try to resist, that 50% discount looks like a great deal even though I know that they never intended to sell it at its "original" price. Similarly, I cannot bring myself to pay the extra money for the national brand when it sets on the shelf next to the discount store brand. Order effects, like halo effects, operate both in the marketplace and in our surveys. Neither should be dismissed for both contain valuable information.

Overview of Next Post

In order to realize the full potential of the bifactor model, we will need a confirmatory factor model. In the next post, I will show how a structural equation model can be used with a battery of satisfaction ratings and outcome measures like overall satisfaction, retention, and recommendation. It is a type of key driver analysis, although the goal is not to identify the most important rating item but to better understand the data generation process. The bifactor model can be fit and the estimates can tells us the relative contribution that is made by the general factor g and the specific product/service factors to the outcome measures.

Appendix with R code to create the three diagrams:I used William Revelle's psych package to create the diagrams. He has done all the work, all I needed to do was add a main title. This is another must-visit website. Revelle provides a comprehensive introduction to R, an unfinished psychometric book with a number of well-written chapters (see chapter 6 for bifactor), and a detailed application of bifactor models in How Important is the General Factor of Personality?