The main objective of this thesis is to propose new techniques to simplify the interpretation
of newly formed `variables' or components, while reducing the dimensionality
of multivariate data. Most attention is given to the interpretation of principal components,
although one chapter is devoted to that of factors in factor analysis. Sparse
principal components are proposed, in which some of the component loadings are made
exactly zero. One approach is to make use of the idea of correlation biplots, where
orthogonal matrix of sparse loadings is obtained from computing the biplot factors of
the product of principal component loading matrix and functions of their variances.
Other approachesin volve clustering of variablesa s a pre-processings tep, so that sparse
components are computed from the data or correlation matrix of each cluster. New
clustering techniques are proposed for this purpose. In addition, a penalized varimax
approach is proposed for simplifying the interpretation of factors in factor analysis,
especially for factor solutions with considerably different sum of squares. This is done
by adding a penalty term to the ordinary varimax criterion.
Data sets of varying sizes, both synthetic and real, are used to illustrate the proposed
methods, and the results are compared with those of existing ones. In the case
of principal component analysis, the resulting sparse components are found to be more
interpretable (sparser) and explain higher cumulative percentage of adjusted variance
compared to their counterparts from other techniques. The penalized varimax approach
contributes in finding a factor solution with simple structures which are not
revealed by the standard varimax solution.
The proposed methods are very simple to understand and involve fast algorithms
compared to some of the existing methods. They contribute much to the interpretation
of components in a reduced dimension while dealing with dimensionality reduction of
multivariate data.