There are many techniques in ecological statistics for exploratory data analysis of multidimensional data. These are called 'ordination' techniques. Many are the same or closely related to common techniques elsewhere in statistics. Perhaps the prototypical example would be principal components analysis (PCA). Ecologists might use PCA, and related techniques, to explore 'gradients' (I am not entirely clear what a gradient is, but I've been reading a little bit about it.)

On this page, the last item under Principal Components Analysis (PCA) reads:

PCA has a serious problem for vegetation data: the horseshoe effect. This is caused by the curvilinearity of species distributions along gradients. Since species response curves are typically unimodal (i.e. very strongly curvilinear), horseshoe effects are common.

Further down the page, under Correspondence Analysis or Reciprocal Averaging (RA), it refers to "the arch effect":

RA has a problem: the arch effect. It is also caused by nonlinearity of distributions along gradients.

The arch is not as serious as the horseshoe effect of PCA, because the ends of the gradient are not convoluted.

Can someone explain this? I have recently seen this phenomenon in plots that re-represent data in a lower dimensional space (viz., correspondence analysis and factor analysis).

What would a "gradient" correspond to more generally (i.e., in a non-ecological context)?

If this happens with your data, is it a "problem" ("serious problem")? For what?

How should one interpret output where a horseshoe / arch shows up?

Does a remedy need to be applied? What? Would transformations of the original data help? What if the data are ordinal ratings?

The answers may exist in other pages on that site (e.g., for PCA, CA, and DCA). I have been trying to work through those. But the discussions are couched in sufficiently unfamiliar ecological terminology and examples that it is harder to understand the issue.

$\begingroup$(+1) I found a reasonably clear answer at ordination.okstate.edu/PCA.htm. The "curvilinearity" explanation in your quotation is totally wrong--which is what makes it so confusing.$\endgroup$
– whuber♦Jun 24 '15 at 21:41

$\begingroup$I've tried to answer your questions but I'm not sure how well I've achieved that seeing as I am an ecologist and gradients is how I think of these things.$\endgroup$
– Gavin SimpsonJun 24 '15 at 22:50

$\begingroup$@whuber: The quoted "curvilinearity" explanation might be confusing and not very clear, but I don't think it's "totally wrong". If the species' abundances as a function of position along the true "gradient" (using an example from your link) were all linear (perhaps corrupted by some noise), then the cloud of points would be (approximately) 1-dimensional and PCA would find it. The cloud of points becomes bent/curved because the functions are not linear. A special case of shifted Gaussians leads to a horseshoe.$\endgroup$
– amoebaJun 27 '15 at 19:41

$\begingroup$@Amoeba Nevertheless, the horseshoe effect does not result from curvilinearity of the species gradients: it arises from nonlinearities in the distribution ratios. The quotation, in attributing the effect to the shapes of the gradients themselves, does not identify the cause of the phenomenon correctly.$\endgroup$
– whuber♦Jun 27 '15 at 19:54

1 Answer
1

Q1

Ecologists talk of gradients all the time. There are lots of kinds of gradients, but it may be best to think of them as some combination of whatever variable(s) you want or are important for the response. So a gradient could be time, or space, or soil acidity, or nutrients, or something more complex such as a linear combination of a range of variables required by the response in some way.

We talk about gradients because we observe species in space or time and a whole host of things vary with that space or time.

Q2

I have come to the conclusion that in many cases the horseshoe in PCA is not a serious problem if you understand how it arises and don't do silly things like take PC1 when the "gradient" is actually represented by PC1 and PC2 (well it is also split into higher PCs too, but hopefully a 2-d representation is OK).

In CA I guess I think the same (now having been forced to think a bit about it). The solution can form an arch when there is no strong 2nd dimension in the data such that a folded version of the first axis, which satisfies the orthogonality requirement of the CA axes, explains more "inertia" than another direction in the data. This may be more serious as this is made up structure where with PCA the arch is just a way to represent species abundances at sites along a single dominant gradient.

I've never quite understood why people worry so much about the wrong ordering along PC1 with a strong horseshoe. I would counter that you shouldn't take just PC1 in such cases, and then the problem goes away; the pairs of coordinates on PC1 and PC2 get rid of the reversals on any one of those two axes.

Q3

If I saw the horseshoe in a PCA biplot, I would interpret the data as having a single dominant gradient or direction of variation.

If I saw the arch, I would probably conclude the same, but I would be very wary of trying to explain CA axis 2 at all.

I would not apply DCA - it just twists the arch away (in the best circumstances) such that you don't see to oddities in 2-d plots, but in many cases it produces other spurious structures such as diamonds or trumpet shapes to the arrangement of samples in the DCA space. For example:

We see a typical fanning out of sample points towards the left of the plot.

Q4

I would suggest that the answer to this question depends on the aims of your analysis. If the arch/horseshoe was due to a single dominant gradient, then rather than have to represent this as $m$ PCA axes, it would be beneficial if we could estimate a single variable that represents the positions of sites/samples along the gradient.

This would suggest finding a nonlinear direction in the high-dimensional space of the data. One such method is the principal curve of Hastie & Stuezel, but other non-linear manifold methods are available which might suffice.

For example, for some pathological data

We see a strong horseshoe. The principal curve tries to recover this underlying gradient or arrangement/ordering of samples via a smooth curve in the m dimensions of the data. The figure below shows how the iterative algorithm converges on something approximating the underlying gradient. (I think it wanders away from the data at the top of the plot so as to be closer to the data in higher dimensions, and partly because of the self-consistency criterion for a curve to be declared a principal curve.)

I have more details including code on my blog post from which I took those images. But the main point here is the the principal curves easily recovers the known ordering of samples whereas PC1 or PC2 on its own does not.

In the PCA case, it is common to apply transformations in ecology. Popular transformations are those that can be thought of returning some non-Euclidean distance when the Euclidean distance is computed on the transformed data. For example, the Hellinger distance is

Where $y_{ij}$ is the abundance of the $j$th species in sample $i$, $y_{i+}$ is the sum of the abundances of all species in the $i$th sample. If we convert the data to proportions and apply a square-root transformation, then the Euclidean distance-preserving PCA will represent the Hellinger distances in the original data.

The horseshoe has been known and studied for a long time in ecology; some of the early literature (plus a more modern look) is

$\begingroup$Thanks, Gavin. Consider ordinal ratings 1:5 from a dataset w/ questions like: "I like my doctor", & "I feel like my doctor cares about me as a person". These are not meaningfully distributed across either space or time. What would be the 'gradient' here?$\endgroup$
– gung♦Jun 25 '15 at 15:58

$\begingroup$W/ a 5x5 table & high N, one way to visualize the data is w/ CA. The data are ordinal, but CA doesn't recognize that; so we can check to see if adjacent rows / columns are closer than ones further apart. Both sets of points fall along a clear line in the appropriate order, but the line curves such that the extremes are closer to each other than the midpoint in 2D space. How should that be interpreted?$\endgroup$
– gung♦Jun 25 '15 at 15:59

$\begingroup$CA finds an ordering for both the rows (samples) and variables (cols) that maximises the dispersion of the sample "scores". It finds a latent variable (a linear combination of the variables) that maximises that dispersion. We call that latent variable a gradient.$\endgroup$
– Gavin SimpsonJun 25 '15 at 16:07

$\begingroup$Re the compression, do you mean closer to one-another on CA axis 1 or closer to one another in terms of Euclidean distance in the scale of the biplot? Either way, this is really an issue in the projection of the data to a low dimensional space. DCA tries to undo this effect by pulling apart samples at the end of the detrended DCA axis 1 and compressing the samples near the origin. So yes, it's a problem, but it is due to inflexibility of the method to capture the underlying gradient appropriately. We can live with it or use a more flexible approach (in ecology at least).$\endgroup$
– Gavin SimpsonJun 25 '15 at 16:11

1

$\begingroup$If you looked at this in more dimensions, the problem would go away. I think this is just a limit of the method; it does OK in many cases but fails in others.$\endgroup$
– Gavin SimpsonJun 25 '15 at 16:23