README.md

factoextra is an R package making easy to extract and visualize the output of exploratory multivariate data analyses, including:

Principal Component Analysis (PCA), which is used to summarize the information contained in a continuous (i.e, quantitative) multivariate data by reducing the dimensionality of the data without loosing important information.

Correspondence Analysis (CA), which is an extension of the principal component analysis suited to analyse a large contingency table formed by two qualitative variables (or categorical data).

There are a number of R packages implementing principal component methods. These packages include: FactoMineR, ade4, stats, ca, MASS and ExPosition.

However, the result is presented differently according to the used packages. To help in the interpretation and in the visualization of multivariate analysis - such as cluster analysis and dimensionality reduction analysis - we developed an easy-to-use R package named factoextra.

The R package factoextra has flexible and easy-to-use methods to extract quickly, in a human readable standard data format, the analysis results from the different packages mentioned above.

It produces a ggplot2-based elegant data visualization with less typing.

It contains also many functions facilitating clustering analysis and visualization.

We'll use i) the FactoMineR package (Sebastien Le, et al., 2008) to compute PCA, (M)CA, FAMD, MFA and HCPC; ii) and the factoextra package for extracting and visualizing the results.

Why using factoextra?

The factoextra R package can handle the results of PCA, CA, MCA, MFA, FAMD and HMFA from several packages, for extracting and visualizing the most important information contained in your data.

After PCA, CA, MCA, MFA, FAMD and HMFA, the most important row/column elements can be highlighted using :

their cos2 values corresponding to their quality of representation on the factor map

their contributions to the definition of the principal dimensions.

If you want to do this, the factoextra package provides a convenient solution.

PCA and (M)CA are used sometimes for prediction problems : one can predict the coordinates of new supplementary variables (quantitative and qualitative) and supplementary individuals using the information provided by the previously performed PCA or (M)CA. This can be done easily using the FactoMineR package.

If you want to make predictions with PCA/MCA and to visualize the position of the supplementary variables/individuals on the factor map using ggplot2: then factoextra can help you. It's quick, write less and do more...

Several functions from different packages - FactoMineR, ade4, ExPosition, stats - are available in R for performing PCA, CA or MCA. However, The components of the output vary from package to package.

No matter the package you decided to use, factoextra can give you a human understandable output.

Main functions in the factoextra package

Visualizing dimension reduction analysis outputs

Functions
Description
fviz_eig (or fviz_eigenvalue)
Extract and visualize the eigenvalues/variances of dimensions.
fviz_pca
Graph of individuals/variables from the output of Principal Component Analysis (PCA).
fviz_ca
Graph of column/row variables from the output of Correspondence Analysis (CA).
fviz_mca
Graph of individuals/variables from the output of Multiple Correspondence Analysis (MCA).
fviz_mfa
Graph of individuals/variables from the output of Multiple Factor Analysis (MFA).
fviz_famd
Graph of individuals/variables from the output of Factor Analysis of Mixed Data (FAMD).
fviz_hmfa
Graph of individuals/variables from the output of Hierarchical Multiple Factor Analysis (HMFA).
fviz_ellipses
Draw confidence ellipses around the categories.
fviz_cos2
Visualize the quality of representation of the row/column variable from the results of PCA, CA, MCA functions.
fviz_contrib
Visualize the contributions of row/column elements from the results of PCA, CA, MCA functions.

Dimension reduction and factoextra

As depicted in the figure below, the type of analysis to be performed depends on the data set formats and structures.

In this section we start by illustrating classical methods - such as PCA, CA and MCA - for analyzing a data set containing continuous variables, contingency table and qualitative variables, respectively.

We continue by discussing advanced methods - such as FAMD, MFA and HMFA - for analyzing a data set containing a mix of variables (qualitatives & quantitatives) organized or not into groups.

Finally, we show how to perform hierarchical clustering on principal components (HCPC), which useful for performing clustering with a data set containing only qualitative variables or with a mixed data of qualitative and quantitative variables.