Outline of Use

Principal Component Analysis

One important type of analysis performed by the
FACTOR procedure is principal component analysis.
The statements

proc factor;
run;

result in a principal component analysis.
The output includes all the eigenvalues and the
pattern matrix for eigenvalues greater than one.

Most applications require additional output.
For example, you may want to compute principal component
scores for use in subsequent analyses or obtain a
graphical aid to help decide how many components to keep.
You can save the results of the analysis in a
permanent SAS data library by using the OUTSTAT= option.
(Refer to the SAS Language Reference: Dictionary
for more information on permanent SAS data libraries and librefs.)
Assuming that your SAS data library has the libref
save and that the data are in a SAS data set called raw,
you could do a principal component analysis as follows:

The SCREE option produces a plot of the eigenvalues
that is helpful in deciding how many components to use.
The MINEIGEN=0 option causes all components
with variance greater than zero to be retained.
The SCORE option requests that scoring coefficients be computed.
The OUTSTAT= option saves the results
in a specially structured SAS data set.
The name of the data set, in this case fact_all, is arbitrary.
To compute principal component scores, use the SCORE procedure.

proc score data=raw score=save.fact_all out=save.scores;
run;

The SCORE procedure uses the data and the scoring
coefficients that are saved in save.fact_all
to compute principal component scores.
The component scores are placed in variables
named Factor1, Factor2, ... , Factorn
and are saved in the data set save.scores.
If you know ahead of time how many principal components
you want to use, you can obtain the scores directly from
PROC FACTOR by specifying the NFACTORS= and OUT= options.
To get scores from three principal components, specify

To plot the scores for the first three
components, use the PLOT procedure.

proc plot;
plot factor2*factor1 factor3*factor1 factor3*factor2;
run;

Principal Factor Analysis

The simplest and computationally most efficient method
of common factor analysis is principal factor analysis,
which is obtained the same way as principal component
analysis except for the use of the PRIORS= option.
The usual form of the initial analysis is

The squared multiple correlations (SMC) of each variable with all
the other variables are used as the prior communality estimates.
If your correlation matrix is singular, you
should specify PRIORS=MAX instead of PRIORS=SMC.
The SCREE and MINEIGEN= options serve the same
purpose as in the preceding principal component analysis.
Saving the results with the OUTSTAT= option enables you
to examine the eigenvalues and scree plot before deciding
how many factors to rotate and to try several different
rotations without re-extracting the factors.
The OUTSTAT= data set is automatically marked TYPE=FACTOR,
so the FACTOR procedure realizes that it contains
statistics from a previous analysis instead of raw data.

After looking at the eigenvalues to estimate the
number of factors, you can try some rotations.
Two and three factors can be rotated
with the following statements:

The output data set from the previous
run is used as input for these analyses.
The options N=2 and N=3 specify the
number of factors to be rotated.
The specification ROTATE=PROMAX requests a promax rotation,
which has the advantage of providing both orthogonal and
oblique rotations with only one invocation of PROC FACTOR.
The REORDER option causes
the variables to be reordered in the output so that variables
associated with the same factor appear next to each other.

You can now compute and plot factor scores for
the two-factor promax-rotated solution as follows:

Maximum-Likelihood Factor Analysis

Although principal factor analysis is perhaps the
most commonly used method of common factor analysis,
most statisticians prefer maximum-likelihood (ML)
factor analysis (Lawley and Maxwell 1971).
The ML method of estimation has desirable asymptotic properties
(Bickel and Doksum 1977) and produces better estimates
than principal factor analysis in large samples.
You can test hypotheses about the number
of common factors using the ML method.

The ML solution is equivalent to Rao's (1955) canonical
factor solution and Howe's solution maximizing the
determinant of the partial correlation matrix (Morrison 1976).
Thus, as a descriptive method, ML factor analysis
does not require a multivariate normal distribution.
The validity of Bartlett's test for the number
of factors does require approximate normality plus
additional regularity conditions that are usually
satisfied in practice (Geweke and Singleton 1980).

The ML method is more computationally demanding
than principal factor analysis for two reasons.
First, the communalities are estimated iteratively,
and each iteration takes about as much computer
time as principal factor analysis.
The number of iterations typically
ranges from about five to twenty.
Second, if you want to extract different numbers
of factors, as is often the case, you must run the
FACTOR procedure once for each number of factors.
Therefore, an ML analysis can take 100 times
as long as a principal factor analysis.

You can use principal factor analysis to get a rough idea
of the number of factors before doing an ML analysis.
If you think that there are between one and three factors,
you can use the following statements for the ML analysis:

The output data sets can be used for trying different rotations,
computing scoring coefficients, or restarting the procedure in
case it does not converge within the allotted number of iterations.

The ML method cannot be used with a singular correlation
matrix, and it is especially prone to Heywood cases.
(See the section "Heywood Cases and Other Anomalies" for a discussion of Heywood cases.)
If you have problems with ML, the best alternative is to use
the METHOD=ULS option for unweighted least-squares factor analysis.