Correspondence Analysis Marketing Research Help

Correspondence analysis is an MDS technique for scaling qualitative data in marketing research. It is an exploratory technique designed to study two-way and multi-way tables containing some measure of correspondence between the rows and columns. The measure of correspondence can show the similarity, affinity, confusion, association, or interaction among row and column variables. The primary purpose of the technique is to produce a simplified representation of the information contained in a large frequency table so that a large qualitative data can be explored through a dimensional map. For example, if we want to assess the qualitative association between different brands and personality traits, we can use correspondence analysis (CA) to graphically represent the association among brands and traits. In CA, the input data are in the form of contingency table indicating a qualitative association between the rows and columns.

CA scales the row and column objects in the corresponding units, so that each can be displayed graphically in the same low-dimensional space. The spatial map provides insights IDtO (i) similarities and differences within the rows with respect to a given column category, (ii) similarities and differences within the column categories with respect to a given row category, and (iii) relationship among the row and column categories.

This method is considered as an extension of MDS with the characteristics of principal-component analysis, as it transforms the non metric data into metric forrrn like MDS, and executes data reduction, like principal-component analysis (Chapter 19). It is primarily based on the principle of decomposing the overall Chi-Square statistic of contingency table by identifying a small number of dimensions in which the deviations from the expected values can be represented. This is similar to. the goal of principal-component analysis, where the total variance is decomposed, so as to arrive at a lower-dimensional representation of the variables. CA provides a graphical representation of row and column objects and results in the grouping of categories (activities, brands, or other stimuli) found within the contingency table,just as the principal-component analysis involves the grouping of the variables. The results are interpreted in terms of proximity among the row and column objects of the contingency table. Objects that are closer together are more similar in the underlying structure

The advantage of CA, as compared to other multidimensional scaling techniques, is that it reduces the data-collection demands imposed on the respondents, as only binary or categorical data are obtained. The respondents are merely asked to check the association between the two sets of variables, like which personality trait(s) is (are) associated with each of several brands. The input data are the number of “yes” responses for each personality trait on each brand. Brands and personality traits are then displayed in the same multidimensional space showing association among them. This method can be used in a variety of exploratory studies like brand positioning, brand association, primary-consumer segmentation, new product development, advertising research, and choice of brand name. In fact, there is no limit to the number of marketing applications of this technique.

The disadvantage of this method is that between-set (i.e., between column and row) distances cannot be meaningfully interpreted. While conducting CA, we must also remember that this method is just an exploratory technique and not suitable for hypothesis testing

Statistics Associated with Correspondence Analysis

The following statistic and statistical terms are associated with correspondence analysis

Correspondence table. Correspondence table is the cross-tabulation of row and column objects. It also includes row and column marginal totals. This serves as an input for correspondence analysis.

Contribution of point to the inertia of dimension. The contribution of a point to the inertia of a dimension is the weighted squared distance from the projected point to the origin, divided by the inertia of the dimension. It is the relative proportion of inertia of a dimension which is explained by a point. Similar to factor loading used in factor analysis to interpret the meaning of factors, the contribution of points to the inertia of dimensions are used to indicate the meaning of correspondence dimensions.

Conducting a Correspondence Analysis

Figure 21.12 shows the steps involved in conducting CA. Steps consist. of problem formulation, obtaining input data, calculation of row and column profiles, calculation of row and column overviews, analyzing summary output, and finally, interpreting dimensional joint plot. The researcher must carefully formulate the problem as a variety of data may be used as an input variable for CA and the result of CA is dependent on the selected row and column categories. The researcher must also assess the quality of the CA result

Formulate the Problem

Problem formulation requires a researcher to specify the purpose for which the CA is to be used and to select the variables and its categories accordingly. Problem formulation involves exploring the association among variables. To illustrate CA, suppose a marketing manger wants to explore

Steps Involved in Correspondence Analysis

the relationship among motorbike brands and personality traits and also wants to observe how each motorbike brand is associated with different personality traits among Indian consumers. For this problem, the decision regarding the selection of both brands and personality traits should be done carefully, as the selected items affect the nature of their dimensional representations. In the present illustration, we have taken the executive segment of motorbike brands. Had we included the other segments of motorbike brands also in this study, the dimensional representation would have differed from the current result. So, one must be careful in deciding the categories of variables.

Next step involves the calculation of row and column overviews presented in the result (Table 21.8). These row and column overviews report mass. the score in each dimension inertia

Correspondence Table

the contribution of point to the inertia of dimension, and the contribution of dimension to the inertia of point. Scores of row and column objects in each dimension are the coordinates of brand and personality-trait points on the plot. For example, the scores of Unicorn brand on Dimension I and Dimension 2 are –{).1945 and 0.2821, respectively. These scores (-{).1945 and 0.2821) are the coordinates of Unicorn brand on the plot. Each brand and personality-trait item contributes to the inertia of dimension. Similar to the factor loading of the factor analysis, in CA contributions of points to the inertia of dimensions are used to indicate the meaning of items toward each dimension. Row and column points which contribute considerably to the inertia of a dimension are central to that dimension and provide a basis for the interpretation of the solution. In our example, Pulsar’s contribution of inertia toward 2nd dimension (0.541) is high, while Splendor’s contribution of inertia toward 1st dimension (0383) is high. Splendor and Pulsar contribute about 38% and 54% of the inertia of the 1st and 2nd dimension, respectively. This means among the brands, Splendor and Pulsar brands dominate Dimension I and Dimension 2, respectively. Similarly, among the personality traits, economical (0.570) and dominating (0346) personality traits dominate the I st and 2nd dimensions, respectively. Like the point’s contribution to the inertia of dimensions, each dimension also contributes to the inertia of points and is called the contribution of dimension to the inertia of point.

This contribution conveys the quality of representation of row and column points on the map. In our example, the contribution of the first two dimensions to the inertia of Unicorn brand is only 7.4%. This indicates that 92.6% of the inertia is not contributed by first two dimensions, which implies that Unicorn brand is poorly represented in a two-dimensional CA solution. On the other hand, for Pulsar brand 96.7% of the inertia is contributed by the first two dimensions. This indicates a good representation of Pulsar brand in a two-dimensional CA solution.

Analyse Summary Output

While analyzing the summary of CA result, the first step in interpreting the output is to look at the Chi-Square value to test the statistical significance of the association between two variables undertaken in the study (brands and personality traits). The results reported in Table 21.8 indicate that both the variables are associated (Chi-Square = 197.14; P <0.00 I).

Ideally, we desire a CA solution that represents the relationship among the row and column categories in as few dimensions as possible. But it is better to look at the maximum number of dimensions, to check the relative contribution of dimensions toward the solution. The maximum number of dimensions for CA can be equal to the number of active rows minus I or the number of active column minus I, whichever is less. In our example, the maximum number of dimensions is six (minimum of 7 brands minus I and 12 personality traits minus I). However, we are always concerned about the optimum number of dimensions. For that identification, we check

Result Output of Correspondence Analysis

Result Output of Correspondence Analysis

the inertia accounted by each dimension. In our illustration case, there are 7 rows and 12 columns, thus, if there is no dependency among the data, then the average dimension must account for 100 divided by the number of rows minus I% of the total inertia in terms of rows (10017 – 1 = 16.67%) and similarly, it must also account to 100, divided by the number of columns minus 1% of the total inertia in terms of column (100/12 – 1 = 0.9.09%).’1 Therefore, any dimension which contributes more than the maximum of the above two calculated percentages should be regarded as significant and included in the final solution. In our example, any dimension contributing more than 16.67% of the inertia, should be considered and included in the solution. As shown in Table 21.8, Dimension I and Dimension 2 contribute 46.64% and 18.25%, respectively, of the total inertia. Hence, the relationship among bike brands and personality traits can be optimally displayed in two dimensions, and the first two dimensions cumulatively explain 64.89% of the total inertia

Interpret Plot

The interpretation of the plot is practically simple. Figure 21.13 illustrates a two-dimensional representation of association among the motorbike brands and personality traits, called “biplot.’ The points which are Close to the origin in the plot have an undifferentiated profile. In Figure 21.13, the Unicorn brand and contemporary personality trait are close to the origin. This indicates that Unicorn is the most undifferentiated among the bike brands, with respect to personality traits, and contemporary personality trait is the most undifferentiated personality trait, with regard to all the bike brands. In the plot, the row points which are closer and situated away from the origin have similar protiles.’? In Figure 21.13, among the brands Apache and CBZ have simjlar profiles; Splendor and Discover have similar profiles, while Pulsar has a unique profile based on the personality-trait association. The joint display of brands and personality traits shows the relationship among bike brands and personality traits. This relationship can be assessed by drawing a line from the origin to each column point (personality trait) and then drawing a perpendicular line from the row points (brands) to this line. The distance from the intersection of the two lines to the column point (personality trait) indicates how objects of the two variables are related.16•18 For example, as illustrated in Figure 21.14, the line from “dominating” personality trait aid in explaining the relationship between “dominating” traits and different brands. It can be seen that Pulsar brand is closest to “dominating” personality trait, followed by CBZ, Discover, Unicorn, Apache, and Passion and splendor. Similarly, Apache brand is closest to “fancy” personality trait’, followed by CBZ, Unicorn, Passion, Splendor, Discover, and Pulsar brand. Similar interpretation can be done for understanding the association between each personality trait and different brands

In this way, a large qualitative data set of association among the categories of variables, contained in the frequency table can be represented in a low-dimensional space. However, this method of data reduction is very sensitive to the outliers in the data. If the data of contingency table contain one or more outliers in the rowand/or column, the effect of such outliers is to dominate the interpretation of one/more of the dimensions. Due to this, the re~ainin~ rowand/or column points have a tendency to be closely clustered in the plot making it difficult to interpret. To avoid this, we should always remove the outliers before conducting the CAP

Blplot of Motorbike Brands and Personality traits

Association of Dominatinq Personality Trait with a Different Motorbike Brand

Sample Question (Part of the Questionnaire)

Please tick the personality trait(s) associated with following motorbike brands