Alzheimer patients and the lexis of space

The data set presented here was compiled by Frédérique Gayet, a psychomotor therapist whose research I supervised in 2013. Gayet (2013) focused on spatial prepositions in French: à côté de « next to » en dessous de « below », au dessus de « on top of », à gauche/droite de « to the left/right of », etc.

Psychomotor therapists interact physically with their patients. This physical interaction is cued by verbal references to space. The loss of spatial vocabulary by Alzheimer patients is damaging to this interaction. It is also an indication of the stage of the disease. Studying the spatial lexicon of Alzheimer patients is therefore important with respect to both the therapist-patient interaction and the patient’s diagnosis and treatment.

The data file for this case study is alzheimer.rds (download the file). Although not based on a corpus extraction, this data set was selected because of its ability to illustrate how principal component analysis works.

Principal Component Analysis

Principal component analysis (henceforth PCA) takes as input a table of data T of i individuals or observations (rows) and j variables (columns). PCA handles continuous and nominal data. The continuous data may consist of means, reaction times, formant frequencies, etc. The categorical/nominal data is used to tag the observations. I do not dwell upon the basics of PCA here. These are covered in Baayen (2008: Section 5.1.1), Desagulier (2017: Section 10.2), among many other references.

The experimental setup and the data

The data consist of 20 observations: 10 Alzheimer patients and 10 control patients. Five tests were run: two topological relationship tests (TRT), one comprehension test, one geronto-psychomotor examination (GPE), and one mini-mental-state examination (MMSE). Originally, the TRT is designed for children ages 4 to 6. Drawings of objects are shown to each patient. The patient must describe the objects and their respective positions. Two TRT tests were conducted: one with verbal prompts (e.g. “where is the spoon with respect to the cup?”) and another without such verbal prompts. Each TRT test is graded on a scale from 0 to 50. The geronto-psychomotor test is graded on a scale from 0 to 30, The comprehension test is graded on a scale from 0 to 25. The MMSE is used to assess the stage of the disease (Folstein, Folstein, and McHugh 1975).

There is a total of eight variables. The qualitative categories of the AGE variable have been recognized as integers. They must be turned into factors so that AGE is not treated as a quantitative variable.

data$AGE <- as.factor(data$AGE)

R implementation

Before we run PCA(), we should consider standardizing the variables (i.e. centering and scaling). Because our table contains measurements in different units, standardizing the variables is compulsory. The PCA() function standardizes the variables by default (scale.unit=TRUE). When a table contains measurements in the same unit, standardizing the variables is optional. To prevent standardizing, set the scale.unit argument to FALSE (scale.unit=FALSE).

We load the FactoMineR package and run the PCA with the PCA() function. Because the variables AGE, STATUS and STAGE are qualitative and categorical, they are declared as supplementary.

Next, we inspect the eigenvalues explained by the principal components.

round(pca.object$eig, 2)

The first major discontinuity occurs after the first component. This component alone accounts for 80.08% of the variance which, given the ratio between the number of observations and the number of columns, expresses a significant structure of the data in PCA. The first two components account for almost all the variance of the table (89.95%).

We plot the variables separately by running the plot.PCA() function on the PCA object and setting the choix argument to var.

plot.PCA(pca.object, choix="var")

The resulting plot serves as a guide to interpret the graph of individuals and categories.

PCA plot of variables

Each variable is represented as an arrow. The circle is known as the circle of correlations. The closer the end of an arrow is to the circle (and the farther it is from where the axes intersect at the center of the graph) the better the corresponding variable is captured by the two components, and the more important the components are with respect to this variable.

In this case, all the variables point to the right, which means that:

all the quantitative variables are somehow positively correlated in the first component;

individuals which appear in this part of the plot correspond to patients who obtained higher scoreson the tests;

patients with Alzheimer symptoms to cluster to the opposite part because we have reasons tobelieve that they obtained lower scores.

The positive correlations between all the quantitative variables in the first component is verified whenwe inspect the PCA object with dimdesc().

Only the TRT_WITH_PROMPT variable is positively correlated with the second component.

It is now time to plot individuals and categories. Again, we use the plot.PCA() function. To facilitate interpretation, we add arguments, some of which have already been introduced (shadow and autoLab). The factors of the AGE and STATUS variables are colored in brown with col.quali. Each factor of the eighth variable (STAGE) gets its own color (habillage=8).

Interpreting the results

The individuals (the patients) appear as integers ranging from 1 to 20. As expected, we find a clear divide between sane, control patients in the right part of the plot, and patients with a diagnosed Alzheimer condition in the left part of the plot. More surprising is the distribution of patients along the first component. We would expect a cline from right to left based on the increasing severity of the Alzheimer condition:

SANE > MILD > MODERATE > MODERATE_SEVERE > SEVERE

This is not the case. From right to left, we have:

SANE > MODERATE > SEVERE > MODERATE_SEVERE > MILD

This implies that some patients have conditions that do not match what is revealed by the conventional tests used to assess the severity of the Alzheimer condition. For example, the profile of patient 3, who has moderate symptoms, is close to the profiles of patients 5 and 6, who have severe symptoms.

PCA is very useful when it comes to identifying atypical profiles. Such is the case of:

patient 1 (mild), in the bottom-left part,

patient 4 (severe), in the far left corner,

patient 7 (severe), in the upper left corner,

patient 8 (moderate–severe), in the upper left corner.

From a practitioner’s perspective, these patients deserve special attention because their actual condition is somehow at odds with what the conventional diagnostic testing for Alzheimer’s disease and dementia reveal.