I am trying to find differentially expressed genes, between placebo P and drug M, as well as within placebo and drug groups at different time points. My experimental design is multi-factorial where some subjects receive placebo P and others receive drug M. I have gene expression data from samples collected for both drug and placebo, at three time points (1,2,3). Following edgeR manual section 3.3.1, I have generated the following targets data frame.

In order to identify DEGs between groups of my interest, I have made contrasts that I want to incorporate in the glmQLF line of code. However, I also want to carry out a paired-analysis mentioned in edgeR manual section 3.4.2 in order to adjust for baseline differences between the subjects. Hence, using a suggestion from this post (differential expressed genes, paired samples and multiple factors) I have built a design matrix using

design <- model.matrix(~0+ treatment:time + subjects)

The columns of this design matrix contain columns for subjects and treatmentM:time1, treatmentP:time1, treatmentM:time2, treatmentP:time2, treatmentM:time3 and treatmentP:time3. However, when I get the following error when I estimate gene dispersion using

It's pretty much as it says, your design matrix is not of full rank. You can't estimate the treatment:time effects and account for the subject-level effects at the same time. Consider, for example, a gene that increases in expression in all P.1 samples compared to the M.1 samples. You could attribute this to a P vs M effect, or to a subject-specific effect that only affects samples 1, 11 and 12; there's no way to mathematically distinguish these two possibilities if subject is included as a factor in your design matrix.

There are multiple solutions to this problem. The most appropriate is to switch to voom with limma, and use duplicateCorrelation with a one-way layout of treatment:time factors and subjects as the batch. This will effectively treat subjects as a random effect, avoiding dependencies with the treatment:time factor while still making use of all samples in the data set:

If you want to continue using edgeR, you will need to drop some coefficients to eliminate the linear dependencies. I would suggest dropping all coefficients with time1 in the name. This means that the remaining treatmentX:timeY coefficients will represent the log-fold change of time Y over time 1 in treatment X. Each of these can be dropped to detect DE between time points in the same treatment group. However, if you want to compare the expression of samples between treatment groups, you will need to subset the data set to only contain the relevant samples, i.e., only one sample from each level of subjects. This ensures that all samples are independent, as there will no longer be dependencies between samples from the same subject.