The three papers in this session illustrate quite dramatically the wide diversity of research in consumer behavior. All papers use laboratory experiments to investigate advertising; yet the papers address vastly different questions and approach their problems in quite different manners. The first paper by Joel Saegert attempts to apply a cognitive information processing concept to advertising recall. The second paper by Ed Hackleman and Subhash Jain, which is quite atheoretical in nature, contrasts consumer perceptions of comparison and non-comparison advertisements. The last paper by Peter Webb explores the factors which determine advertising "clutter" as measured by viewer perceptions of the amount of advertising in programming.

A DEMONSTRATION OF LEVELS-OF-PROCESSING IN ADVERTISING RECALL

Saegert has presented one of the first empirical tests in a consumer behavior context of the "levels of processing'' theory of Craik and his colleagues (Craik and Lockhart, 1972; Craik and Tulving 1975; Lockhart, Craik, and Jacoby, 1975). This theory, which focuses on the processes rather than the structures of learning and memory, has had a large impact on cognitive psychology and is very likely to lead to important research in consumer behavior. Saegert's research is important since it introduces many of us to this important area. (An excellent review of this area is available in Olson (1977)).

The goal of Saegert's research was to assess whether the results of experiments reviewed by Craik and Lockhart (1972) and/or conducted by Craik and Tulving (1975) could be replicated with advertisements instead of meaningful words as stimuli. Although his experiment seems to have been carefully conducted, I am a little uncertain whether he sufficiently replicated the procedures reported by Craik and his colleagues. An exact replication is not vital per se; however, I wonder exactly what was manipulated by Saegert's procedures. It may be that Saegert's hypotheses were confirmed but that the differences in recall and recognition were due to something other than his cited reason. At the very least, I worry that Saegert's experiment may lead to the misleading perception that the levels of processing notion is a rather simple one with according limited implications for consumer behavior research.

Saegert mentioned two similar paradigms of past research. The first by Hyde and Jenkins (1969g1973) manipulated the type of orienting task to experimental groups. Some experimental groups were instructed to intentionally learn and remember the words, and others were asked to process the words in a certain way but were not told that there would be a subsequent learning test. Within the incidental learning condition, subjects were asked either to check whether the letter "E" appeared in the word, the number of letters in the word, or to rate the pleasantness of the words. The former two conditions were interpreted by Craik and Lockhart to be shallower than the latter. Another set of processing manipulations was reported by Craik and Tulving (1975). They instructed different groups of subjects to process at presumably increasingly deeper levels by either structure ("Is the word in capital letters?"), phonemic ("Does the word rhyme with WEIGH?"), categorical ("Is the word a type of fish?"), or sentence ("Would the word fit the sentence ____________?") instructions. The first question types were assumed to result in a structural processing, whereas the last two led to increasingly semantic processing.

In his experiment, Saegert asked subjects to process either structurally or semantically the brand name in a print advertisement. The structural manipulation very closely resembled previous ones. However, I am less certain of the result of the semantic questions. Saegert mentioned two examples: "Do you have this brand in your home?" and "Have you ever bought this brand?". Unlike Craik and Tulving's instructions (which were most similar to Saegert's), Saegert's questions were not able to anticipate whether the answer would be "yes" or "no." This may be an important aspect. Craik and Tulving found higher recall and recognition for "yes" than "no" answers in semantic processing modes such as judging whether a blank in a sentence could be filled by the target word. Also, more complex semantic processing instructions led to greater recall with questions with positive answers but not with negative ones.

This is important to the theoretical distinction between depth and spread or elaboration of processing. Craik and Tulving hypothesized that positive responses enable the subject to form a unified and more meaningful image of the complete cue. A congruent "yes" response may help to also recall other associations with the target stimulus.

Because this research is assessing the replicability of past cognitive psychology experiments, it might have been more appropriate to more exactly follow one particular procedure. For example, questions analogous to those used by Craik and Tulving might include "Is the advertised brand a physical product (as opposed to a service)?" or "Is the advertised brand something people eat?" Such questions would offer the advantage of being able to anticipate a "yes" or "no" and thus being able to control for congruity and coinciding greater elaboration in processing. Then the effects of positive and negative answers could be examined for similarities to past results. There may also be some theoretical importance to distinguishing between ego-oriented processing (e.g., "Is this brand in your home?") and more objective-processing. I'd expect that the former would lead to more elaborative processing by evoking more personal connections (Krugman, 1965). The greater recall and recognition in the deeper processing condition in Saegert's experiment may have been due to the personal nature of his questions and the resulting ego-oriented processing rather than or in addition to actual deeper processing. The two possibilities cannot be separated due to their confounding.

Saegert's "yes" and "no" answers could be compared for differences in subsequent brand name recall or recognition. Of course, with the subjective questions used by Saegert, any greater recall for positive answers could be alternatively interpreted as the result of greater initial familiarity. As Olson (1977) has emphasized, initial familiarity is a necessary condition for deeper, more semantic processing. For extensive semantic processing of a stimulus to occur, a subject must have a previously developed semantic structure of knowledge about that stimulus. Therefore, it was probably important that Saegert used adults as subjects who were more apt than, say college students, to be familiar with the various advertising brands. It might be interesting to manipulate the type of processing of ads but to analyze the effects of processing within levels of familiarity or usage. One would expect that the effect of greater processing would be greater with brands initially familiar than those which are unfamiliar because the more elaborate reactions to deep processing instructions can be evoked only in the former condition.

Saegert's data may be analyzed in other ways to offer suggestions about future research. Certainly, as Saegert pointed out, it is very important for future research to ascertain whether and, if so, how advertising copy or formats can be designed to evoke deeper processing. It might be useful to include in an analysis of variance a term for ads which could test whether some brand names are better recalled than others. If so (which is likely), a post-hoc analysis of ad characteristics (e.g., long vs. short copy, questioning headline or not, etc.) in addition to brand familiarity might suggest how to manipulate recall. A similar procedure by Ray and Sawyer (1971) was somewhat productive in finding factors that interacted with repetition. Of course, recall differences may be due to factors other than induced processing but any insights at this early research stage would be welcome. I do not know if the five seconds per ad allowed by Saegert was sufficient for recall to be sensitive to different ad types. Certainly, the achieved recall levels (2 to 5 out of 40)were not very high.

Hopefully, processing can be manipulated by factors other than verbal instructions. Some positive evidence that this can be done was offered by Marslen-Wilson and Tyler (1976) who manipulated the level to which spoken prose passages could be processed by presenting either normal sentences, sentences which, although grammatically correct, were difficult to semantically process, or sentences with neither syntactic or semantic information. Recall of the sentences increased with the level of available processing. However, to be useful for applied purposes, more subtle manipulations within the verbal communication must prove effective.

A great deal of consumer behavior research has involved the process by which exposed information becomes trans posed into attitudes and behavior. Levels of processing theory may suggest more efficient theoretical explanations of this process and point out profitable avenues for future research. For example, other than due to message characteristics, processing may vary on either a situational or individual basis. One difference that may be either individual or situational is involvement. The notion of consumer involvement has been much discussed (e.g., Krugman, 1965) but with very little theoretical base (see Houston and Rothschild, 1978). It seems obvious that Krugman's claim of low ad recall with low involvement can be explained by shallow or superficial processing.

Processing level might also be related to effects other than learning. Certainly the greater the extent of initial processing, the greater the opportunity for a consumer to compare new information with currently stored information. Cognitive responses, which are a possible measure of processing level, have been shown to be related to both immediate and delayed measures of attitude (Sawyer, in press). There is some evidence that negative cognitive thoughts are more persistent over time (Sawyer and Ward, 1977). Perhaps negative thoughts require more extensive processing and thus tend to be better retained. Cacioppo and Petty (in press) found that a message advocating a counter-attitudinal position elicited more thoughts relevant to the topic than a proattitudinal message. This result coincided with greater recall of the arguments in the counter condition than in the pro condition. However, some method must be found to distinguish different intervening causes of greater recall. For example, greater retention of negative thoughts could be due to greater distinctiveness from most other thoughts instead of being due to deeper processing.

Certainly, as Saegert observes, repetition at least offers the opportunity of more elaborate processing (see Wyer, 1974). If repetition thus leads to greater learning, such learning may be related to affect change (see Stang, 1975). Grush (1976) has shown that the positive or negative nature of generated associations appears to mediate the direction of resulting affect. However, repeated meaningful communications may yield learning and affective reactions that are much more independent than for less meaningful stimuli such as nonsense syllables. There is evidence that, although there are similar patterns of learning and affect as the result of repetition, the two effects are not related to each other (Cacioppo and Petty, in press). Olson (1977) suggested a relationship in which learning and belief structure influence each other over time. Further research may help establish a more direct link between recall and attitudes by means of levels-of-processing.

CONSUMER ATTITUDES TOWARD COMPARISON ADVERTISING

Hackleman and Jain report the results of an experiment about comparison advertising. As they point out, this topic is the subject of much concern by decision makers in advertising management and public policy.

The authors have obviously taken a tremendous amount of care in implementing their study. Ads were tested for twelve different products; subject subsamples from three different population centers were used; and extensive pretesting [I assume the authors included the product with the brand (i.e., Fantastic Camera). Otherwise, the pre testing of the meaningfulness of the brand names would have made no sense to the subject. Also, the bipolar adjective scale format of the meaningfulness scale more resembles a semantic differential than a Likert scale.] on the fictitious brand names was conducted. The measurement of the ten rating scales randomized the order of the ten scales and whether the positive or negative pole of the adjectives was first. A complex experimental design was used which controlled for many factors (e.g., the balancing of brand names between the first and second position in the comparison ad copy). Finally, although I have trouble understanding how two separate ads could be combined into one comparative ad which in turn was similar to a third ad which was non-comparative, it seems likely that the ads were well constructed and quite comparable to real advertisements.

My major concern with this study involves its overall purpose. Hackleman and Jain state that the focus is on "effectiveness of comparison advertising from the view point of consumers." Why is what consumers think about the advertisements very relevant? Although Leavitt's (1970) work has argued that various consumer perception factors may yield useful diagnostics, I know of no positive evidence that consumers' ad perceptions are a valid measure of effectiveness. Nor do I know of any advertisers or public policy makers who are primarily concerned with such measures. It seems to me that the key issues involve more "traditional" measures of effectiveness such as ad copy comprehension and recall, consumer use of comparative information in forming beliefs on compared product attributes, and brand preference.

In examining Hackleman and Jain's employed measures of consumer's attitudes toward the ads, I wonder what is actually being measured. It is admirable that the authors were explicitly concerned with reliability and construct validity, something most of us consider only when critiquing other people's research. However, I do not know how they assessed reliability and validity from the factor analysis of the pretest. Whatever the quality of individual scales, the combination of all ten scales (presumably by summing) into one univariate scale makes no sense at all (e.g., Osgood, Tannenbaum, and Suci, 1957) and permits no detailed insights about the effects of comparison ads. I would advise a factor analysis of the ten scales followed by a multivariate analyses of variance of the resulting factor scores. Subsequent univariate analyses of variance could isolate which factors were significantly affected if the multi variate analysis indicates some significant overall effects.

Let me give an example of the potential usefulness of less ambiguous measures of ad perception. Wilkie and Farris (1975) hypothesized that comparison ads will be judged more informative and more interesting. Isolation of individual scales or factors that measure these dimensions could test Wilkie and Farris's hypothesis. Similarly, perceptions of confusion and believability might be of diagnostic value. However, as stated above, I question the relevance of even unambiguous measures of consumer attitudes towards ads.

It probably is a matter of personal taste, but I would prefer that subjects not be exposed to both comparative and non-comparative ads for the same product. Such a design may overly focus subjects' attention on differences between the two ads -- a big problem with the Ogilvy and Mather (1975) research. Arguments in favor of the within-subjects design include control of individual subject differences, as Hackleman and Jain mentioned, and more external validity if the eventual use of comparative ads for a given brand is combined with non-comparative ones (see Greenwald, 1976).

Finally, I thoroughly disagree with the conclusions of the authors about the effectiveness of comparative ads and about the differences by product type. Concerning the latter, I have always wondered about the operation al distinction between shopping and specialty goods. How much inter-judge reliability is there in rating products into these two categories? Why are pianos or foreign liqueurs specialty goods and not shopping goods? Couldn't a camera be a specialty good? (Similarly, I cannot agree with the McCarthy quote about specialty goods; people do not merely explore to see if an overseas vacation or foreign sports car is avail able). Certainly Hackleman and Jain cannot be faulted for using a long used product classification scheme. However, I wish they had done a more formal statistical analysis on the product types. The authors included product type with twelve levels as one of three experimental factors in an analysis of variance. The significance of this factor means that there were differences among the twelve ad pairs. Only by analyzing type of "goods" (convenience, shopping, or specialty) and products nested within type of goods as separate factors could the claimed interaction of type of goods and ad copy (comparison or not) be isolated and statistically tested. The post-hoc analysis of Table 5 can not be considered conclusive evidence. At best, it is suggestive.

Most important, Hackleman and Jain's conclusion that their "findings do not support the FTC's claim that comparison ads provide consumers with information they desire before making a purchase" has absolutely no foundation. Nor is there any evidence in support of Ogilvy and Mather's cited results about persuasion, brand identification, claim believability, awareness of competitors, or message confusion. How can any conclusions be made when most of the effects in the authors' conclusions were not even measured in this study? Inclusion of less ambiguous and more relevant measures that could have been the bases for such conclusions would have enabled the painstaking research of Hackle-man and Jain to be much more useful.

There are many interesting questions to examine in future research about comparative advertising. An excel lent research agenda has already been proposed by Wilkie and Farris (1975) who listed thirteen tactical issues and another thirteen hypotheses about the effects of comparison advertisements. I'll not repeat their ideas here. Rather than to discuss many additional specific research directions, I would like to suggest what I believe are appropriate types of research questions and a preferred process of generating research ideas.

I was impressed with two aspects of the Wilkie and Farris paper. First, there was an attempt to conceptualize likely sources of differences such as types of ads and products from various behavioral science and marketing theories. Second, they concentrated their thinking on more traditionally acceptable measures of advertising effectiveness such as ad attention and recall, correct brand identification, perceived brand position, cognitive responses, beliefs about brand's performance on product attributes, evaluation of attributes, and brand preference.

Prasad (1976) exemplified the type of research approach advocated by Wilkie and Farris. He looked at a tactical issue of managerial importance -- whether explicitly named competitor brands or whether the more traditional "brand X" should be included in the comparison ads. In addition, he tested a market factor of likely importance -- whether the advocated brand was most preferred by the audience or not. The tested hypotheses were generated from several concepts from social psychology such as indexing, selective learning and recall, source credibility, and attitude-discrepant communication. Measured communication effects included brand and claim recall (both immediate and one week delayed), claim believability, and perceived brand position. I mention Prasad's study not because of any special theories, designs, or measurement techniques but because it agreed with my bias toward Ray's (1978) advocated process of taking a managerially relevant problem, borrowing relevant concepts or micro-theoretical notions from behavioral or economic theory, and measuring appropriate communication effects.

Certainly other behavioral concepts such as attribution theory, refutational appeals, order effects of two-sided appeals, and assimilation-contrast theory could be used to generate other hypotheses. The value of using theory to generate hypotheses in comparison advertising research is that it helps to get away from the type of "Ad A versus Ad B" research common in applied research (see Ray, 1975) and thus may be able to make a contribution beyond the immediate question(s) about a particular type of comparison ad.

I am currently interested in the efficacy of a particular type of comparison advertising in which explicit comparisons are made on more than one product attribute. This strategy, suggested by Boyd, Ray and Strong (1972), attempts to reduce the determinance of an important or highly evaluated attribute by a comparison which suggests that all brands are equal in performance in that attribute. Such a strategy would allow a brand's superiority on another dimension to have a larger effect on brand preferences. Such research would also assess the wisdom of adding a third (not highly evaluated) attribute in which the advocated brand was not superior. While such research would test a specific comparative advertising tactic, it could also explore a more global communication problem of whether it is possible to change the contribution of an attribute to brand preference (e.g., Lutz, 1975).

PERCEPTION OF ADVERTISING CLUTTER

Like Hackleman and Jain, Webb has investigated a problem of great interest to both advertisers and public policy makers. In his study, he has focused on a very interesting variable -- perception of time. Time perception and whether and how it may be manipulated should be of great interest to consumer behavior researchers in other areas as well as advertising.

There are some obvious and important strengths to Webb's studies. They certainly rate high in external validity. "Real" people were used as subjects; actual television commercials were used, and exposure was as natural as possible.

Webb's measures of perceived time length and number of commercials remind me of a similar venture of mine in my dissertation research (Sawyer, 1971). While studying the effects of repeated advertising exposures, I assumed that one effect of repetition is irritation and annoyance. I thought that a good measure of this effect might be the extent to which subjects overestimated the number of exposures as a function of the actual number and, perhaps, the type of ad that was repeated. As it turned out, the measure was subject to so much individual variance that it was not a sensitive measure of annoyance.

Webb has hypothesized that the degree of over- or under estimation of length and number of commercials will vary as a function of their "clutter." My major concern with his paper involves my confusion about what is clutter. I've always thought of clutter as the number of ads per commercial break or the total number of ads in a pro gram (Maneloveg, 1971). Webb agrees that the definition is not obvious and suggests an additional possible measure -- the number of commercial interruptions. Unfortunately, Webb's experiments have not helped me to decide which is the most appropriate definition.

Study one appears to define clutter as either the total number of ads or the number of ads per interruption. "Low" clutter was represented by two groups of two 30-second ads (four in total), and "high" clutter was represented by two groups of four ads (a total of eight ads). After being convinced that high clutter, not low, was the problem, I was somewhat surprised that low clutter resulted in greater over-estimation of the number and time duration of the ads.

Study two unconfounded the number of ads per interruption and total number of ads by controlling for the latter. Since I've always thought of increased clutter as the stringing together of more and more ads per interruption, I assumed that the experimental conditions of six sets of two 30-second ads, three sets of four ads, and two sets of six ads represented increasing levels of clutter. However, perhaps due to his uncertainty as to a proper definition of clutter, Webb did not choose to label the experimental conditions as high or low clutter. As in the first study, the second showed that, the lower the number of commercials per interruption, the greater the over-estimation of the total number and duration of the ads. Is less clutter "worse?"

In his description of the results of study two, Webb suggested that the number of total interruptions is a better measure of clutter, and that, given this definition, more clutter is worse. [Although the perceived length per commercial was not calculated in study two, it showed a trend similar to the other measures with estimates of 71, 64, and 57 seconds per commercial in the 6 X 2, 3 X 4, and 2 X 6 conditions, respectively.] To support the view that the number of interruptions is the key source of overestimating, an admirable attempt is made to conceptualize the underlying reasons for the over-estimates. Webb suggests that either complexity, differences from ad sequences that could be expected from past television viewing, or both could explain the results of a direct relationship between the numbers of interruptions and the amount of over-estimation.

I find the complexity analogy difficult to accept. Why should six breaks of two ads be more complex than the other conditions? The rationale that "a string of unrelated non-program elements is likely to be judged more complex than an integrated program segment" doesn't make much sense since none of the conditions had a totally integrated segment. Since all elements -- ads and pro gram -- appeared in each condition, the judgment of the complexity of the whole half hour sequence should not differ.

The expectation explanation seems more promising. If the number of 30-second commercials per break averages four in prime time, then the study one results would be consistent with an explanation that the greater over estimation in the "low" clutter condition was due to subject's tendency to assume that approximately four and (not the actual two) ads were presented each break. Similarly, I'd expect that over-estimates in study two would be greatest where the number of ads per break was less than the expected number and lowest where the number of ads per break exceeded the expected. My rationale is based on my expectation that subjects can most easily remember the number of breaks, and, in calculating the total number of ads, subjects would multi ply the number of breaks times the "normal" or expected number of ads per break. If my speculation is correct, this process should lead to differences in the estimates of the total number of ads. However, there are little differences in the number estimated and the direction of the slight differences shows the highest estimate in the 2 X 6 condition instead of the 6 X 2 condition.

Another way to have assessed the predictive validity of an expectation explanation would involve the manipulated program environment of either a prime time comedy show or a daytime serial. Webb states that the total number of commercials (twelve) is typical of daytime and exceeds the normal total for prime time. Therefore, due to departures from expected, overestimates should have been higher for the prime time than the daytime program environment. However, virtually no differences were found. However, if, as Webb seems to have decided, the number of commercial interruptions is the key factor in determining clutter, than the expectancy explanation would predict an interactive effect between program environment and commercial schedule. The interruption pattern of 6 X 2 is typical of daytime whereas three interruptions typify prime time. Thus, it would seem that overestimates of the 6 X 2 schedule would be greater for the prime time environment than the daytime, whereas an opposite effect would be predicted for the 3 X 4 schedule. Since the 2 X 6 schedule is not representative of either environment, there should be no differences in the effects of that schedule between the two program types. Unfortunately, Webb did not analyze the interactive effects since, for some reason, he used one-way analyses of variance instead of a two-way analysis of variance.

I realize that this study of time perception was an exploratory one and was not the prime purpose of Webb's dissertation. Future research ought to try to resolve the question of the proper definition of clutter. I expect that one problem is that the definition of advertisers might differ from that of public policy people which might in turn differ from consumers. It seems to me that Webb has decided that clutter be defined from the latter perspective. However, if clutter is defined by whatever condition leads to highest estimates of the duration and number of commercials, any test of this relationship is tainted by the circular definition.

Several issues are evident involving the measurement of time perception. Should it be the estimate of total time devoted to commercials, total number of commercials, or the ratio of one to the other? How do subjects arrive at these estimates? Do they, as I speculated earlier, do some form of mental arithmetic multiplying number of ads (actual or expected) by the number of breaks? Although it may not be easy to obtain valid results (e.g., Nisbett and Wilson, 1977), subjects could be asked to reconstruct how they arrived at their estimates. It seems reasonable that number of commercials is more easily perceived and remembered than their length. This was supported by Webb's results. Per haps, estimates could be made more accurate if the questions broke down the memory task to the number of interruptions, the number of ads per interruption, and the length per ad.

Other measurement issues include questions about the construct validity of commercial time and/or number estimates as a measure of consumer annoyance or lack of enjoyment. Do perceptions of greater amounts of television time devoted to advertising lead to measurably greater irritation, and is such annoyance related in turn to less television viewing? Convergent validity could be assessed by examining the correlations between the time and number perceptions and measures of either the advertisements or total programming on such attributes as "complex," "annoying," "enjoyable," "too many commercials," "too many interruptions ," "commercials are too long," "more commercials than usual," etc. Discriminant validity could be assessed by correlations with measures of ads and programming not expected to relate to perceived duration and number of commercials. Such additional measures would allow better tests of the theoretical explanations of the effects of ad schedules. The expectancy explanation might also be explored by examining individual differences. Webb tried to do this by measuring estimates of total pro gram length but was thwarted by people apparently relying more on past experience than on actual perception. Perhaps analysis of only those who accurately estimated the program length -- whatever the reason -- would be fruitful. If these people were more likely to rely on expectation, then they might also be more apt to confirm hypotheses based on expectancy. Alternatively subjects might be grouped according to their television viewing habits or by their answers to questions, in a pretest unconnected to the experimental session, about their knowledge of break patterns and length of commercials per break.

It may be that the greatest value of this exploratory research is to raise issues about the underlying processes and how to manipulate time perceptions. Such a dependent variable has been usefully studied in other consumer behavior contexts such as time or distance to retail locations (e.g., McKay and Olshavsky, 1975). Further understanding of time perception may help in other areas. For example, negative attitudes about mass transit may be the result of misperceptions about the total time involved in comparison to driving in one's own car. Shoppers' hesitation to more carefully monitor product labels and prices may similarly be caused by over-estimates of the extra time necessary.

CONCLUSION

I enjoyed reading these three pieces of research about advertising, and I strongly commend the authors for their efforts. Many of my questions or criticisms may be due to my inability to understand from the brief descriptions exactly what was done rather than any actual deficiencies. I have tried to raise questions about the papers in hopes of isolating conceptual and technical issues that can be addressed in future re search. I am sure that the authors agree with me that any constructive discussion of the issues that they and I have raised will make their considerable research time and expense worthwhile.

M. L. Ray, "The Advertising Pretest as Part of a Multi-measure, Multimethod, Multisituation Validation and Application Research System" Advances in Consumer Research, 2 (1975), 577-87.

M. L. Ray, "The Present and Potential Linkages Between the Microtheoretical Nations of Behavior Science and the Problems of Advertising: A Proposal for a Research System," in A. J. Silk and H. L. Davis (ed.), Behavioral and Management Science in Marketing, New York: Wiley, (1978), 99-141.