I realise that looks like a paired sample design, it's just for simplicity.

design <- model.matrix(~0 + as.factor(treatment) + as.factor(batch))

When I run the above code, I get a design matrix with 5 column names, all three of my treatments, then only two of my batches. I'm not sure why this is happening, I assumed that because I've added zero at the beginning of my model design, that stops terms from being absorbed into the intercept, can anyone shed any light on why I'm only seeing 5 terms?

I realise I could add in the batch as an "extra" variable using:

design <- model.matrix(~0 + as.factor(treatment), as.factor(batch))

but ultimately, I'd like to include multiple variables that could account for technical variation.

The other batch is absorbed in to the treatments. You can think of the coefficients in your model as 'adjustments' that make things comparable. The treatments are the means of each treatment for batch 'i', and the batch coefficients are the difference between say, batch ii and batch i, which then adjusts the data for batch ii (by subtracting out the mean difference between batch ii and batch i) so that you can compare data between batches.

Where you capture (and remove) the mean difference between the two batches in the second coefficient, to allow for inter-batch comparisons. The same thing is true for batch iii.

Edit: This doesn't necessarily allow for inter-batch comparisons. Instead, it is 'controlling for the batch effect', thus allowing you to make comparisons between treatments without having to worry that the batch effect is biasing your results.

There are linear dependencies between your columns in this design matrix. More specifically, the row sums of the first 3 columns in bad.design gives the same vector (all-ones) as the row sums of the last 3 columns. This means that you'd have infinite solutions for the fitted value vector; you could just increase the estimates of the first 3 coefficients by any value, so long as you decrease the estimates of the last three by the same value. Obviously, this is not ideal, as we want one solution.

James, Aaron, thanks for your explanations. That helps clear things up. So if I were to make contrasts and do an lm fit, as per below, that should look at 'a' against 'b', but control for the batch effect?