Sebastian Sauer Stats Blog

Latest Posts

On a recent psychology conference I had the impression that psychologists keep preferring to show mean values, but appear less interested in more detailled plots such as the boxplot. Plots like the boxplot are richer in information, but not more difficult to perceive.

For those who would like to have an easy starter on how to visualize more informative plots (more than mean bars), here is a suggestion:

library(ggplot2)library(dplyr)Fair%>%filter(nbaffairs!=0)%>%ggplot(aes(x=sex,y=nbaffairs))+ggtitle("Difference in extramarital affairs between sexes")+geom_boxplot()+geom_jitter(alpha=.5,color="firebrick")+theme_minimal()

As can be seen, the distribution information reveals some more insight than bare means: There appear to be three distinct groups of “side lookers” (persons having extramarital relations).

Interestingly, a strong voice of German scientiests uttered their concerns about being scooped if/when sharing their data (during the official meeting of the society). This being said (sad), the German research foundation (DFG) has updated its guidelines now stressing (more strongly) that publicly funded projects should share their data, with the rationale that the data do not belong to the individual scientiest but to the public, as the public funded it (I find that convincing). Finally, Brian Nosek had a key note talk, where he vividly argued in favor of open science; I found the talk very inspiring.

Anyway, I do not want to discuss the “why” and “if” but rather the “how” in this post, giving some recommendations. Or more precisely, what I personally do at the moment for living up to open science. The following list is by no means innovative, impressive, it just presents my current thinking and doing:

Hopefully, we will not drift astray in emotional discussions on “research parasites”, “methodological terrorists”, or “cyberbullies”, but focus on what is constructive for our field. One could argue that the only question that counts is “How do I contribute?”.

Likert scales are psychologists’ bread-and-butter tool. Literally, thousands (!) of such “scales” (as they are called, rightfully or not) do exist. To get a feeling: The APA links to this database where 25,000 tests are listed (as stated by the website)! That is indeed an enormous number.

Given their widespread use, the question how useful such tests are has arisen many times; see here, here, or here.

For example, Carifio and Perla 2007 assume that underlying each response format ranging from e.g, “agree” to “disagree” there must be an metric attribute. Thus, they hold the philosophical view that each (psychological) attribute must be metric. They do not present any grounds for that stark claim. Similarly, they assume that each of such scales maps an empirical quantitative attribute. And they assume without any consideration that some given items (they call it a scale), automatically measure the same underlying quantity (if existing). Besides their stark language, I am strongly disagree with many points they are rising. Sadly, they fail to mention even the most basic aspects of measurement theory (see here for a nice introduction; read the work of Michell for a more in-depth reasoning).

For example, one proponent that Likert scales generally do exhibit interval (metric) level is is Labovitz, eg., in his 1970 paper, paywalled.

However, other scholars have insisted that Likert scales do not (generally) possess metric level, and that demonstrating metric niveau is quite a delicate job. Maybe the most pronounced critic is Joel Michell, see e.g. this paper.

As a matter of fact, measurement theory is not so easy, if measured by the sheer weight of “foundational” text books, most notably the three volumes by Krantz et al. .

Reporting the basics of measurement theory is beyond the scope of this post, but let’s briefly mention that, at least, if one variable is to be taken quantitative (here the same as metric), then it should

be ordinal

the distance between adjacent values should be equal (equidistance).

The latter property can be called “additivity”, or at least defines some necessary parts of additivity. (Let’s take ordering (ordinal level) for granted here.)

Equidistance

What is meant by equidistance? It means that, for example, the difference in weight between 1 kg and 2 kg should be equal to the difference in weight between 2 kg and 2 kg. If so (and for many other values 3, 4, 5, … kg as well), we are inclined to say that this variable “kg” exhibits equidistance.

Note that we are not interested in “weight” per se, but in “kg”, or, more precisely, in our measurement device (maybe some old-day balance apparatus) and its claims about weight in kg.

That quite directly yields to a problem of Likert scales: Is the difference between “do not agree at all” and “rather not agree” the same as between “rather not agree” and “strongly agree”?

A similar discussion has been around for school grades.

I think, the short answer is: We cannot take it for granted that the distances are equal; why should they? If we are ignorant or neutral, it appears for more likely that the distances are not equal, as it appears more likely that any two number are different rather than equal.

Sprinters’ example

Let’s look at a practical example. Suppose 10 sprinters are running the 100 meters, with different times:

Enough values = metric level?

Some say (sorry, did not find a citation, but one of my teachers said so!) that if there are “enough” values, the variable becomes “automagically” metric. I cannot see why this must necessarily happen. Suppose we would not have 10 but 100 sprinters; the picture and the argument would in essence remain the same. Equidistance will not necessarily pop out. It may by chance occur, but it is not a necessity (by far not).

Now, if we were to have many items, can we then infer that equidistance will necessarily come out? Let’s put it this way in our sprinter example: Suppose we had not watched one race, but many (say, 8). Assume that the measurement error is negligible (to make things easier but without loss of generality). If measurement error is negligible, then the actual performance, ie., the sprint times, can be taken to be the “real” (“latent”) sprinting time of the person. Measurement error is not only confined to e.g., imprecision when taking the time, but includes local particularities as wind speed, mood disturbances, bad hair days, etc.

Then, again, I think, the argument remains the same: We would have more data, but in this toy example, the average time values of the sprinters would remain the same as in the example above. So the implications also remain the same. Equidistance is no free lunch.

If ordinal and metric association measures are similar, then what the fuss?

Some argue in this way: Association measures (such as Kendall’s tau) and metric association measures (such as Pearson’ r) are often similar.

Hence, it is inferred that it does not matter much if we take the ranks as metric variables. I disagree with that argument.

What follows is inspired by Gigerenzer’s 1981 book (available in German only). Let’s first come up with some data (taken from Gigerenzer’s book, p. 303):

As can be seen in the diagram above, the lines are not intersecting. So the ordinal association measures should be (close to) 1. Actually, we have some ties, that’s why our measures (Spearman, Kendall) are not perfectly one. Note that the raw value are depicted.

Now let’s visualize Pearson correlation. Pearson’s r can be seen as a function of the z-values, so let’s depict z-values of a perfect correlation as a first step.

The lines are far from being horizontal. The z-values are quite different between the two variables as can be seen in the diagram. But still, Pearson’s r is very high. We must infer that strong ordinal assocation is enough to get r really high.

We see that a high r does not guarantee that the z-values between the two variables are similar. Similarly, if both the ordinal association measure (Spearman, Kendall) and the metric association measure (Pearson r) are high, we cannot infer that the metric values and the ranks are identical or very similar.

That’s why I emphatically insist that e.g., Labovitz 1970 is outright wrong when he argues that r and rho yields similar numbers, hence, ordinal can be taken as metric.

It may seem bewildering that the standard deviation (sd) of a vector X is
(generally) unequal to the mean absolute deviation from the mean (MAD) of X, ie.

.

One could now argue this way: well, sd(X) involves computing the mean of the squared
, then taking the square root of this mean, thereby “coming back” to the initial size
or dimension of x (i.e, first squaring, then taking the square root). And, MAD(X)
is nothing else then the mean deviation from the mean. So both quantities are
very similar, right? So one could expect that both statistics yield the same number, given they operate on the same input vector X.

However, this reasoning if flawed. As a matter of fact, sd(X) will almost
certainly differ from MAD(X).

This post tries to give some intuitive understanding to this matter.

Well, we could of course lay back and state that why for heaven’s sake should
the two formulas (sd and MAD) yield the same number? Different computations are
involved, so different numbers should pop out. This would cast the burden of
proof to the opposite party (showing that there are no differences). However, this answer does not really appeal if one
tries to understand why it things are the way they are. So let’s try to develop some sense out of it.

Looking at the formulas

The formula above can be written out as

where is a vector of some numeric values. For the sake of simplicity
refers to the difference of some value to its mean.

Looking at the formula above, our question may be more poignantly formulated as
“why does the left hand side where we first square and then take the opposite
operation, square root, does not yield the same number as the right hand side?”.
Or similarly, why does the square root not “neutralize” or “un-do” the squaring?

If we suspect that the squaring-square-rooting is the culprit, let’s simplify the last equation a bit, and kick-out the part. But note that we are in fact changing the equation here.

For generality, let’s drop the notion that necessarily stands for the
difference of some value of a vector to the mean of the vector. We just say now
that is some numeric value whatsoever (but positive and real, to make
life easier).

Then we have:

This equation is much nicer in the sense it shows the problem clearer. It
is instructive to now square both sides:

In words, our problem is now “Why is the sum of squares different to the square of the sum?”. This problem may sound familiar and can be found in a number of application (eg., some transformation of the
variance).

Let’s further simplify (but without breaking rules at this point), and limit our
reasoning to a vector X of two values only, a and b:

Oh, even more familiar. We clearly see a binomial expression here. And clearly:

Visualization

A helpful visualization is this:

.

This scheme makes clear that the difference between the left hand side and the
right hand side are the two green marked areas. Both are , so in
total. is the difference between the two sides of the equation.

Going back to the average (1/n)

Remember that above, we deliberately changed the initial equation (the initial
problem). That is, we changed the equation in a non-admissible way in order to
render the problem more comprehensible and more focused. Some may argue that we should come back to the initial problem, where not sums but averages are to be computed. This yields a similar, but slightly more complicated reasoning.

Let us again stick to a vector X with two values (a and b) only. Then, the
initial equation becomes:

Squaring both sides yields

This can be simplified (factorized) as

.

Now we have again a similar situation as above. The difference being that on the left hand side (1/2) if factored out; on the right hand side (1/4) is factored out. As the formulas are different (and similar to our reasoning above), we could stop and argue that is unlikely that both sides yield the same value.

Visualization 2

As a final step, let’s visualize the thoughts of the previous lines.

What this amazing forest of crossing lines wants to tell you is the following. For the left hand side, the diagonal lines divide and in two parts of equal size, i.e., and .

For the right hand side, a similar idea applies. But the double-crossed (“x-type”) lines indicate that each of the four forms is divided in 4 equal parts, ie., , and two times .

From this sketch, again it appears unlikely that both sides would yield the same number. We have not proven that is impossible, but our reasoning suggests that it would be highly unlikely to see the same number on both sides of the equation.