About

Background

Bulk segregant analysis (BSA) coupled to high throughput sequencing
is a powerful method to map genomic regions related with phenotypes
of interest. It relies on crossing two parents, one inferior and one
superior for a trait of interest. Segregants displaying the trait of
the superior parent are pooled, the DNA extracted and sequenced.
Genomic regions linked to the trait of interest are identified by
searching the pool for overrepresented alleles that normally originate
from the superior parent. BSA data analysis is non-trivial due to
sequencing, alignment and screening errors.

Results

To increase the power of the BSA technology and obtain a better distinction
between spuriously and truly linked regions, we developed EXPLoRA (EXtraction
of over-rePresented aLleles in BSA), an algorithm for BSA data analysis that
explicitly models the dependency between neighboring marker sites by exploiting
the properties of linkage disequilibrium through a Hidden Markov Model (HMM).

Reanalyzing a BSA dataset for high ethanol tolerance in yeast allowed
reliably identifying QTLs linked to this phenotype that could not be identified
with statistical significance in the original study. Experimental validation
of one of the least pronounced linked regions, by identifying its causative
gene VPS70, confirmed the potential of our method.

Conclusions

EXPLoRA has a performance at least as good as the state-of-the-art and it
is robust even at low signal to noise ratio s i.e. when the true linkage signal
is diluted by sampling, screening errors or when few segregants are available.