I am planning an experiment on gene expression. I have 6 conditions and 3 replicates in each condition. I would like to use ERCC spike-in mixes for quality check: lower limit of detection, dynamic range and also check that the fold-changes are accurate. For example, by knowing that spike-in sequence xyz in condition 2 should be a 4-fold higher than in condition 1, does it actually get detected as a 4-fold change in my analysis? To assess the fold-changes, I need to use 2 different mixes. Only one mix should be used per sample. The manual suggests either adding mix 1 in a reference and mix 2 in all other samples or randomly assigning mixes 1 and 2 among samples. The latter is not very clear to me. Should I use the same mix for all replicates within a condition or should I randomise really among all the samples? Or can I randomise among conditions (for example 1, 3, 4 get mix 1 and 2, 5, 6 get mix 2)? So far, I am intending to do the latter, so using mix 1 in conditions 1, 3, 4 and mix 2 in conditions 2, 5, 6. This would also be cheaper because I can only buy equal volumes of mix 1 and 2. Do people have any advice on this?

1 Answer
1

If including more biological replicates has a low experimental cost, you should be using 6 replicates per condition (and fewer reads per condition, i.e. maintaining the total number of reads) in order to pick up the majority of differentially-expressed genes. For that, I'd recommend using 3 lanes of a flow cell, with 12 dual-indexed barcodes (with the same index on each end) for each lane. Doing dual indexing will protect a bit against index switching and make the results more robust.

randomly assigning mixes 1 and 2 among samples. The latter is not very clear to me. Should I use the same mix for all replicates within a condition or should I randomise really among all the samples?

Randomly assign mixes among and within the conditions. Make sure that the mixes are equally represented (as much as possible) across the conditions, but that each condition has at least one of each mix. This can be done using something like the following in R:

For working out the level of fold change this is reliably replicated (or "statistically significant"), you can split the samples into mix 1 and mix 2, then apply the same algorithms for ERCC differential expression as with the non-ERCC genes.

However, as ERCC is typically added in proportion to the amount of RNA (or, if you're lucky and have cell-sorted populations, possibly the number of cells), all it's really testing is the accuracy of sample prep.

$\begingroup$Thank you for this very clear answer. If I randomise as you say, what happens if my pseudo-random number generator assigns the same mix to all the replicates of the same condition? Shall I just switch one over to the other mix in that case? In the analysis of the fold coverages, shall I just separate my samples into 2 groups: mix 1 and mix2 and do fold-change analysis between the two groups?$\endgroup$
– charlesdarwinFeb 16 '18 at 20:53

1

$\begingroup$If all replicates of the same condition have the same mix, then switch one of them over -- "each condition has at least one of each mix", but it'd be better to distribute the numbers of mixes equally in each group.$\endgroup$
– gringerFeb 17 '18 at 0:02