Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

Description

The following zero-sum game between nature and a statistician blends Bayesian methods with frequentist methods such as p-values and confidence intervals. Nature chooses a posterior distribution consistent with a set of possible priors. At the same time, the statistician selects a parameter distribution for inference with the goal of maximizing the minimum Kullback-Leibler information gained over a confidence distribution or other benchmark distribution. In cases of hypothesis testing, the statistician reports a posterior probability of the hypothesis that is informed by both Bayesian and frequentist methodology, each weighted according how well the prior is known. As is generally acknowledged, the Bayesian approach is ideal given knowledge of a prior distribution that can be interpreted in terms of relative frequencies. On the other hand, frequentist methods such as confidence intervals and p-values have the advantage that they perform well without knowledge of such a distribution of the parameters. However, neither the Bayesian approach nor the frequentist approach is entirely satisfactory in situations involving partial knowledge of the prior distribution, the proposed procedure reduces to a Bayesian method given complete knowledge of the prior, to a frequentist method given complete ignorance about the prior, and to a blend between the two methods given partial knowledge of the prior. The blended approach resembles the Bayesian method rather than the frequentist method to the precise extent that the prior is known. The proposed framework offers a simple solution to the enduring problem of testing a point null hypothesis. The blended probability that the null hypothesis is true is equal to the p-value or a lower bound of an unknown Bayesian posterior probability, whichever is greater. Thus, given total ignorance represented by a lower bound of 0, the p-value is used instead of any Bayesian posterior probability. At the opposite extreme of a known prior, the p-value is ignored. In the intermediate case, the possible Bayesian posterior probability that is closest to the p-value is used for inference. Thus, both the Bayesian method and the frequentist method influence the inferences made. Similarly, blended inference may help resolve ongoing controversies in testing multiple hypotheses. Whereas the adjusted p-value is often considered the multiple comparison procedure (MCP) of choice for small numbers of hypotheses, large numbers of p-values enable accurate estimation of the local false discovery rate, a physical posterior probability of hypothesis truth. Each blended posterior probability reduces to either the adjusted p-value or the LFDR estimate by effectively determining on a hypothesis-by-hypothesis basis whether the LFDR can be estimated with sufficient accuracy. This blended MCP is applied to both a microarray data set and a more conventional biostatistics data set to illustrate its generality.