The goal of eQTLS (Expression quantitative trait loci studies) is to identify and quantify effects of how genetic variants regulate the expression of genes in different biological contexts. The outcome variables (typically on the order of 20,000) are molecular measurements of gene expression and the predictors are genotypes (typically on the order of 1,000,000). The identification of regulatory variants in eQTLS (Consortium et al. (2015); Ongen et al. (2015); Consortium et al. (2017)) proceeds in a hierarchical fashion: the first stage of such a selection identifies promising genes, followed by a search of potential functional variants in a neighborhood around these discovered genes. Once promising candidates for functional variants have been detected, the logical next step is to attempt to estimate their effect sizes. However obtaining samples corresponding to a human tissue remains a costly endeavor. Thereby, eQTLS continue to be based on relatively small sample sizes with this limitation particularly serious for tissues as brain, liver, etc.– the organs of most immediate medical relevance. Naive estimates that ignore the genome-wide selection preceding inference can lead to misleading conclusions about that the magnitudes of the true underlying associations. Due to scarcity of biological samples, the problem of reliable effect size estimation is often deferred to future studies and therefore, inadequately addressed in the eQTLS research community.

In this talk, I will discuss a principled approach that allows the geneticist to use the available dataset both for discoveries and follow-up estimation of the associated effect sizes, adjusted for the considerable amount of prior mining. Motivated to measure these effect sizes as consistent point estimates and intervals with target coverage, my methods are modeled along the conditional approach to selective inference, introduced in Lee et al. (2016). The proposed procedure is based on a randomized hierarchical strategy that reflects state of the art investigations and introduces the use of randomness instead of data splitting to optimize the use of available data. I will describe the computational bottleneck in performing randomized conditional inference. To overcome these hurdles, I will describe a novel set of techniques that have higher inferential power than prior selective inference work in Lee et al. (2016).