The three articles presented in this session all, in one way or another, address the issue of the importance of product attributes for consumers. Despite this commonality in overall purpose, however, each article uses a vastly different methodology, and each finds different results with different implications for consumer research. The comments below represent an attempt to point out possible shortcomings of the procedures and certain conceptual limitations. Effort is made also to provide suggestions for improving the research and its interpretation. Nevertheless, the reader should note that the articles represent pioneering efforts for assessing attributes, and their strengths far outweigh any limitations that may be signaled-out.

BEHAVIORAL MEASUREMENT OF THE ... BY JOHN A. QUELCH

This is a thorough, well thought-out piece of research dealing with an important area in consumer information processing: i.e., the relative importance of attributes in the choice process. There are at least two broad contributions of the research that deserve mention. First, the amount of methodological control and effort taken to discuss the meaningfulness of the research design and general problem are exemplary. The author has operationalized the information display methodology in a manner achieving a good balance between internal and external validity. Overall, the methodology attains a reasonable degree of control of stimuli and external factors while introducing a certain degree of realism in the study. A second contribution can be found in the findings. Specifically, the results provide an interesting descriptive picture of the acquisition of information in terms of the number of attribute dimensions used and the order of information elicitation. Further, the effect of providing a physical sample on information use is ascertained.

Because the strengths of the research can be readily seen on a reading of the article, I will focus on a number of shortcomings and limitations in the study. Further, comments will be made on ways to improve the study or better address the general problem area.

A first question to ask is whether the study can really be viewed as one investigating information processing in the context of initial trial and repeat buying. Although all subjects were recent purchasers and consumers of cold breakfast cereal, a possible confound or artifact exists with the second treatment group. This group--which was exposed to the physical product--was told that the brands were not available in their area. Yet, because the brands were, in fact, real, well-known, and leading sellers in the respondents market, it is likely that many would recognize the brands or at least some of them. This would most likely occur differentially across people in the second sample. Moreover, given the fact that the stimuli are so easily recognized (e.g., Corn Flakes, Rice Chex, Raisin Bran), it is also likely that some of the respondents tried to guess the manufacturer or brand or drew parallels to their own consumption experiences. Both of the above factors could introduce bias in the sense that the inferences made would contain unintended confounding information. At a minimum, a manipulation check could have been performed to see if the respondents received contaminated information and believed the ruse. Further, it seems incorrect to label this sample the repeat purchase condition. What evidence is there that the subjects all regarded the second stimulus as a familiar one?

A second problem entails a possible demand characteristic. Given the particular information display design, it may have been the case that respondents were forced to adopt a lexicographic-like strategy in such a way as to conceive of the experimental task as a game. That is, they may have viewed the task as requiring that they "guess the best or favorite brand using the least amount of information." Notice that it is possible that the "best" brand need not be best relative to their own criteria for decision making but rather might reflect what they think the experimenter, their family, or society expects of them. Evaluation apprehension and demand characteristics seem likely and should have been investigated, perhaps with a manipulation check. Notice further that had the respondents perceived the task as entailing choosing a brand using the least amount of information possible (a reasonable normative expectation people might form when asked to perform in an experiment), then their information processing might reflect a tendency to select and focus on cues offering the greatest potential for a fast judgment. These cues may or may not be the most important to them with respect to consumption of cereal for their own use. Thus, both the internal and external validity of the study may be less than desirable.

The data might support a rival hypothesis in still another sense. Rather than being an indicator of relative importance, the rank order elicitation might merely reflect an "ease of information processing factor." In particular, people might evaluate alternatives by scrutinizing the easiest, least costly, or least time consuming attributes first. The ease of evaluation or cost may not at all be related to the importances of the attributes.

One of the assumptions stated in the study was that the most important attributes would be elicited early "with a view to minimizing the duration of the decision-making process." Alternatively, if the respondent's decision rule were one of accuracy or minimizing risk, it might be expected that the most important attributes would be selected first but with time not considered to be a salient consideration. In any case, the rationale for why the most important attributes would be selected in any order has multiple interpretations and needs further development.

Closely related to this point is the role played by the compensation. Ostensibly, the coupons were intended to motivate the subjects to behave in a manner similar to natural purchase occasions. However, it can just as easily be argued that the coupons motivated subjects to choose unknown or new information and products. That is, some subjects might have viewed the coupons as an incentive to take a chance on a new product--to see if it is any good. The induced motivation may have been also one of trying a product that is new because it costs less or is a "free good." In addition, when looking across subjects within each treatment, decision-making differences might have been a function of individual variance in venturesomeness, habit, or other factors. Hence, conclusions made across subjects within treatments must be regarded with suspicion.

Other questions to answer are: What is the construct validity of the importance measures? Were the differences in decision making across treatment groups statistically significant? Is the experiment a test of the relative importance of attributes? It seems that it is really a test of classes of information or bundles of undifferentiated attributes. Moreover, the design appears to test only the effect of "the opportunity to elicit physical samples," since this is the only difference between the two treatment groups. What does such a manipulation mean and how does "attributes" conceptualized in this way relate to past research on the role of attributes in decision making?

To place things in a proper perspective, a number of comments deserve mention. First, the study deserves recognition as an innovative piece of research. Overall, it was handled with considerable care and foresight. Second, despite the criticisms that can be made, the study does achieve a certain degree of internal validity with some realism provided through the operationalization of stimuli and the general design. The shortcomings that have been pointed out are primarily a function of the attempt to achieve both internal and external validity in a single study. Such a goal will invariably involve tradeoffs and produce pros and cons as part of its very nature. Finally, it should be noted that the article by Professor Quelch represents a well-organized and articulated investigation that should be read by anyone seriously interested in conducting research into consumer decision making.

A COMPARISON OF TWO METHODS... BY JAMES H. MYERS

The problem of determining the optimal level of product attributes is an important one, and Professor Myers presents an interesting study comparing two alternative methodologies. Perhaps the most significant issue in the research to note is the finding that the direct method for determining the ideal point (which is based on subjective ratings) provides consistently lower values than the indirect method (which is based on a cross tabulation of beliefs versus evaluations). The question arises as to whether one method is more accurate than the other, or for that matter, are either valid procedures?

The question of accuracy or validity is not an easy one to address. Because the research begins with measurements of concepts at the level of individual behavior and then aggregates to the group level, it is important to scrutinize the entire process of measurement and aggregation. For example, consider the method for determining direct ideal points. The respondents were asked to rate their ideal using a six-point scale on each of eleven attributes for each of four products. For each attribute, an ideal point score was determined by first obtaining an average for each individual (based on the sum over the four products) and then averaging across people. The resulting score is thus an ideal rating for the attribute based on all respondents. What is the construct validity of this score? Notice that measurement error occurs at least in two places: (1) at the point of averaging across products and (2) at the point of averaging across respondents. Notice further that whatever measurement error is present at the first stage is added to or confounded with that found at the second. Because measurement error is not taken into account or modeled explicitly, it is possible that the ideal point construct is highly fallible. In any case, given the procedure, it is not possible to assess construct validity.

One way to approach the question of construct validity might be the following. First, the validity of the ideal point model could be evaluated at the level of the individual responses. Three components of construct validity need to be addressed: namely, convergent, discriminant, and nomological validity (for a model and methodology for determining construct validity, see Bagozzi (1977, 1978a, 1978b)). The assessment of convergent and discriminant validity requires that multiple measures and methods be employed. The evaluation of nomological validity requires that another concept be employed in a predictive context along with the measures of the ideal point. Second, after establishing the accuracy of measurements and validity of the ideal point ratings at the individual level, the scores can then be aggregated to form a distribution for the market conditions. It is important that the sample of individuals used to summarize the aggregate be a representative one, perhaps drawn randomly from the potential market. Notice that the results obtained by either of the two methods used by Professor Myers are directly dependent on the quality of the sample chosen. In addition to the internal consistency of ideal point measures at the individual level, the validity of the optimal level of attributes obtained in the aggregate depends also on the representativeness of the distribution of responses across people and ratings.

One further problem with the procedures used by Professor Myers deserves mention. The data are derived by observing responses to four products which have fixed values of the eleven attributes. A better procedure would be to obtain descriptive and evaluative ratings on each attribute for each of a number of levels. For example, the response of each person to each of five levels of saltiness could be obtained. Although this increases the time for data collection considerably, some time can be saved if only five or six attributes are investigated. Most products probably possess only five or six salient attributes anyway rather than eleven as used in the present study. Alternatively, an experimental design with blocks, such as the Latin square, can be used to achieve the desired economies.

Finally, it should be noted that the derived methodology may not go far enough in that the cross-tabulations may overlook complex interactions among attributes. For example, no relationship may be evident between characteristic B and evaluations when viewed in isolation, but a strong relationship might exist when characteristic E is taken into account. To represent interaction effects, a multivariate analysis is required such as provided by the log-linear model. For instance, a model to test for all interactions among three salient attributes (A, B, and C) would be

where L is the natural logarithm of expected frequencies and the m's represent main and interaction effects analogous to the terms in the ANOVA model. A procedure such as Goodman's (1970) ECTA methodology can be used to estimate parameters and test hypotheses. The derived methodology rests on the degree of association between attributes and evaluations. Thus, these relationships must be ascertained before confidence in any overall rating can be gained.

Finally, the nature of the relationship between attribute belief judgments and evaluations needs to be examined. In the cross-section survey, the relationship may be purely correlational, causal, or spurious. Yet, the cross-tab cannot identify which of these alternatives it is. An experimental methodology can at least ascertain whether the relationship is a causal one from beliefs to evaluation. The latter is necessary if management is to have faith in its research and decisions to fix product attribute levels.

THE STABILITY OF RESPONSES... BY JERRY C. OLSON AND AYDIN MUDERRISOGLU

This article represents an important contribution for at least two reasons. First, the operationalization of the free elicitation procedure itself is significant in that it demonstrates that such a method can yield meaningful data on cognitive responses. Because it does this in a relatively unobtrusive and unrestrictive way, it provides a promising means for measuring a variety of cognitive concepts central to contemporary theories of human decision making. Second, the test-retest results suggest that the overall methodology may yield reliable measures for at least some concepts of interest in the free elicitation context. With these strengths of the study in mind, the following comments will focus on a number of caveats and suggestions for extending the research.

A first issue to note is the meaning of the concept of stability. The authors state that "stability means that a probe cue...presented to consumers at two points in time elicited similar responses" (as reflected by the rest-retest correlation). It is important to note that stability as the authors define and use it is a "floating" measure of change in that it depicts the change in each individual respondent's position in the distribution of responses relative to other individuals. Thus, interest is restricted to the slope of responses between time periods rather than a change in intercept which might reflect an absolute, main-effect change in the group as a whole. The authors' definition of stability is an important one, but it should be mentioned that it does not address stability in the sense of a systematic influence on responses experienced by all respondents between measurements. Rather, it in effect assumes that no such shock occurred to any or all individuals. If some respondents experienced an extraneous influence between the times of measurement, then this would contaminate the test-retest correlation. Given the present research, there is no way to assess the possibility of this one way or another.

The use of the test-retest correlation as a measure of stability is fine as far as it goes but a number of limitations of the procedure deserve mention. First, because situational factors may systematically affect people, the correlation between measures could be a function of both the characteristics of people (for which the measurement is intended to capture) and the external influence. Yet, the usual test-retest procedure cannot separate these different factors. Second, because the cognitive states of people change over time due to fatigue, maturation, learning, differences in motivation, and so on, the error in measurement from one time to another can also change systematically, producing biased results. The test-retest correlation does not allow one to discriminate between such effects and the true stability over time. Third, demand characteristics, evaluation apprehension, and memory effects tend to be present at each measurement occasion, causing a correlation between errors in measurement and a corresponding bias in the observed correlation between measures.

One way to better represent stability--given that one is restricted to single operationalizations of each concept at each point in time--is to employ the simplex model or a variation thereof. Although the model suggested by Heise (1969) makes rather restrictive assumptions as the authors note, the one proposed by Wiley and Wiley (1970) is somewhat less restrictive. Both models require that measures be taken at three points in time. The Heise (1969) and Wiley and Wiley (1970) models begin with the following structure:

y1 = a1T1 + u1

y2 = a2T2 + u2

y3 = a3T3 + u3

T2 = b1T1 + q1

T3 = b2T2 + q2

where the yi are measures of a relevant concept; the Ti are the corresponding true scores; the ai and bj are parameters relating measures to true scores and true scores to past true scores, respectively; and each error term is mutually uncorrelated with all other error terms and the explanatory variables. Given this model and the following assumptions, it is possible to calculate the reliability of the concept under consideration. With a1 = a2 = a3 imposed as a constraint, Heise (1969) shows that it is possible to compute the stability parameters, b1 and b2. In effect, this assumes that the contemporaneous reliability of each concept is equal at each point in time (although stability may change). Wiley and Wiley (1970) demonstrate that one need not assume that a1 = a2 = a3 if it is assumed that the error variances are homogeneous (i.e., var(u1) = var(u2) = var(u3)). Werts, Joreskog, and Linn (1971) further demonstrate that neither homogeneity in contemporaneous reliabilities nor homogeneity in error variances is necessary to assume, if one has measurements at four points in time. Their model, however, does constrain the contemporaneous reliability parameters to equal one. Further, their model generalizes to multiple wave panels with the caveat that "error variances, true score variances, and unstandardized regression weights between corresponding true scores are identified for all but the first and last measures" (Werts, et al., 1970, p. 111). The procedures developed by Werts, et al. offer the advantages over the traditional test-retest correlations of explicitly modeling measurement error and representing stability as an association between true scores.

If multiple measures are available for a variable of interest for at least two points in time, then the general analysis of covariance structures approach can be used to assess concurrent reliability and stability simultaneously (cf., Bagozzi, 1978b, c). For example, given two measures of a variable at two points in time, the structural equation model permitting the determination of concurrent reliability and stability can be written as:

where it is assumed E(x) = E(T) = E(u) = 0, E(Tu) = 0, E(TT') = f, E(uu') = y (a diagonal matrix of error variances), and E(xx') = S. Using a maximum-likelihood estimation procedure, the stability coefficient may be estimated as the appropriate covariance in ~. Similarly, contemporaneous estimates of the individual and composite reliabilities can be computed from the following formulae, respectively:

where n = the number of measures at time j. For a derivation of the above procedures and an illustration, see Bagozzi (1978b, c). The stability coefficient yielded by the procedures is analogous to those corrected for attenuation. Further, the confounding of measurement error is avoided.

One final point to note is that the stability of responses derived from a free elicitation procedure can be no more meaningful than the validity and significance of the concepts measured. While such notions as the "number of elicited concepts," "time required for elicitation," and "rate of elicitation" are useful as far as they go, it would be much more meaningful to measure concepts with more substantive content. For instance, in the area of cognitive research, concepts such as the nature and intensity of counterarguments, source derogations, or support arguments might be more interesting than merely their rates or frequencies.