Hi
you've already got a completely satisfying explanation. A right skew of
the p value histogram is in fact a typical sign for a covariate for
which you do not control.
A quick example to demonstrate. Let's simulate 1000 times a sample of
four draws from normal distributions:
y <- cbind(
rnorm( 1000, 20, 4 ),
rnorm( 1000, 20, 4 ),
rnorm( 1000, 20, 4 ),
rnorm( 1000, 20, 4 ) )
The first two are supposed to be control, the third and fourth
treatment, and they all have the same mean, i.e., the treatment has no
effect.
Doing a t test on each realization gives us nicely uniform p values:
library(genefilter)
hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value )
Now, assume that one of the two control and one of the two treatment
samples has an elevated mean:
y <- cbind(
rnorm( 1000, 20, 4 ),
rnorm( 1000, 30, 4 ),
rnorm( 1000, 20, 4 ),
rnorm( 1000, 30, 4 ) )
In this case, you get right-skewed p values, because the t test is not
informed of the extra effect present in one sample of each of the two
groups:
hist( rowttests( y, factor( c( "C", "C", "T", "T" ) ) )$p.value )
Simon