Canonical Correlation Analysis | SAS Data Analysis Examples

Examples of Canonical Correlation Analysis

Version info: Code for this page was tested in SAS 9.3.

Canonical correlation analysis is used to
identify and measure the associations among two sets of variables.
Canonical correlation is appropriate in the same situations where multiple
regression would be, but where are there are multiple intercorrelated outcome
variables. Canonical correlation analysis determines a set of canonical variates,
orthogonal linear combinations of the variables within each set that best
explain the variability both within and between sets.

Please Note: The purpose of this page is to show how to use various data analysis commands.
It does not cover all aspects of the research process which researchers are expected to do. In
particular, it does not cover data cleaning and checking, verification of assumptions, model
diagnostics and potential follow-up analyses.

Examples of canonical correlation analysis

Example 1. A researcher has collected data on three psychological variables, four academic variables
(standardized test scores) and gender for 600 college freshman. She is interested in
how the set of psychological variables relates to the academic variables and gender. In
particular, the researcher is interested in how many dimensions (canonical
variables) are necessary to understand
the association between the two sets of variables.

Example 2. A researcher is interested in exploring associations among factors from two multidimensional
personality tests, the MMPI and the NEO. She is interested in what dimensions
are common between the tests and how much
variance is shared between them. She is specifically interested in finding
whether the neuroticism dimension from the NEO can account for a substantial amount of shared variance
between the two tests.

Description of the Data

Let’s pursue Example 1 from above.

We have included the data file, which can be obtained by clicking on
mmreg.sas7bdat.
The dataset has 600 observations on eight variables.
The psychological variables are locus of control, self-concept and
motivation. The academic variables are standardized tests in
reading, writing, math and science. Additionally,
the variable female is a zero-one indicator variable
with the one indicating a female student.

We did not include correlations among the variables at this point because we will get them
later as part of the canonical correlation analysis.

Analysis methods you might consider

Before we show how you can analyze this with a canonical correlation analysis, let’s
consider some other methods that you might use.

Canonical correlation analysis, the focus of this page.

Separate OLS Regressions – You could analyze these data using separate OLS regression
analyses for each variable in one set. The OLS regressions
will not produce multivariate results and does not report information
concerning dimensionality.

Multivariate multiple regression is a reasonable option if you have
no interest in dimensionality.

Canonical correlation analysis

Due to the length of the output, we will be making comments in several places along
the way.

The output below gives the three canonical correlations and the multivariate
tests of the dimensions. These results show that the first two of the three
canonical
correlations are
statistically significant. The output also includes the four multivariate criteria and the
F approximations.

In general, the number of canonical dimensions is
equal to the number of variables in the smaller set; however, the number of significant
dimensions may be even smaller. Canonical dimensions, also known as
canonical variates, are similar to latent variables that are found in factor analysis,
except that canonical variates also maximize the correlation between the two
sets of variables. For this particular model there are three canonical dimensions of which only the first
two are statistically significant. The first test of dimensions tests whether all three
dimensions are significant (F = 11.72), the next test tests whether dimensions 2 and 3
combined are significant (F = 2.94). Finally, the last test tests whether dimension
3, by itself, is significant (F = 2.16). Therefore dimensions 1 and 2 are each
significant while the third dimension is not.

Next, the raw canonical coefficients are shown below. The raw canonical coefficients are interpreted in a manner analogous to interpreting
regression coefficients i.e., for the variable read, a one unit increase in reading leads to a
.0446 increase in the first canonical variate of set 2 when all of
the other variables are held constant. Here is another example: being female leads to
a .6321 increase in dimension 1 for set 2 with the other predictors held constant.

The raw coefficients are followed by the standardized canonical coefficients
shown below. When the variables in the model have very different standard
deviations, the standardized coefficients allow for easier comparisons among the
variables. The standardized canonical coefficients are interpreted in a manner analogous to
interpreting standardized regression coefficients. For example, consider the
variable read, a one
standard deviation increase in reading leads to a 0.45 standard deviation increase in the
score on the first canonical variate for set 2 when the other variables in the model are
held constant.

Things to consider

As in the case of multivariate regression, MANOVA and so on, for valid inference, canonical correlation analysis requires the multivariate normal and homogeneity of variance assumption.
Canonical correlation analysis assumes a linear relationship between the canonical variates and each set of variables.
Similar to multivariate regression, canonical correlation analysis requires a large sample size.