edgeR's cpm function has an argument called prior.count. Based on my understanding of the documentation, it is supposed to be adding a fixed number per sample which is proportional to the library size of said sample. Average of all numbers added to all samples would be equal to prior.count.

However, looking at actual data this does not seem to be the case. Given an imaginary data frame of

df = data.frame(a = c(1,2,3,0),c = c(1,2,3,0)*3, b = c(1,2,3,0)*2)

We can try to calculate log cpms by doing

logCPM = cpm(df,log = TRUE,prior.count = .5)

We can also calculate regular cpms by doing

CPM = cpm(df)

If we wanted to see, what is the number that is added to the CPMs before getting logged, we could do

1 Answer
1

The prior count ends up getting scaled by the ratio of a library size to the average library size and then multiplied by 2 before getting added to each library size (I'm sure there's a good reason for that, but I don't know what it is). Using the example df data frame in your post, let's walk step-by-step through what cpm() is doing: