What does "Bayesian Inference" mean?

Definition of Bayesian Inference in the context of A/B testing (online controlled experiments).

What is Bayesian Inference?

Bayesian inference has no consistent definition as different tribes of Bayesians (subjective, objective, reference/default, likelihoodists) continue to argue about the right definition. A definition with which many would agree though is that it proceeds roughly as follows: an adequate model is formulated, a prior distribition over the unknown parameter(s) of the model is defined, some data x0 is observed and the Bayes' Rule is applied to obtain a posterior distribution on the parameters of interest. This describes the solution of the so-called inverse problem: how to adjust a prior set of information based on a current piece of information. Contrast this with the forward problem: computing the probability of a piece of data based on a hypothetical.

The prior distribution is an initial state of belief, external informal, default information, depending on which kind of Bayesian analysis one is doing. The posterior is proportional to the prior and the Bayesian likelihood (probability of data given hypothesis) and is the final state of belief, information, etc. expressed as a probability distribution over the parameter space. The posterior distribution can be used to compute predictive distributions, predictive intervals (Highest Density Intervals), maximally likely values, and so on. When applied to decision theory a Bayesian loss function is introduced which can influence the posterior just as the prior does.

As can be seen from the above explanation a Bayesian posterior is in effect mixing external information with the data which, unless you know and understand the prior used, is impossible to remove if one wants to run their own analysis based on another set of external information. In this sense Bayesian inference is more akin to a decision-making machinery than to a tool for learning from data. In a practical A/B testing setting this means that unless one knows and understands the prior one cannot interpret the posterior and any of its derivatives (intervals, point estimates, etc.). A consequence for that is that unless the prior is communicated to stakeholders they would be unable to understand the posterior.

The effects of a prior or loss function can be from trivial to dominating the data entirely in any finite-sample setting, which is why understanding their interaction with the data is crucial. When a subjective prior is posited one should understand the reference class problem and how it applies to the case at hand. For example, an A/B testing vendor might apply a prior which is informed by the outcomes of your prior tests or even by the outcomes of other people's tests. Why should the outcome of one test be a good predictor of the outcome of another is a mystery. Notwithstanding that, what benefits to the precision of the inference are achieved if one mixes such a prediction with the actual data from the experiment is unclear.

The above should provide some clarity to conversion rate optimization practitioners when they face claims such "Variant B has XX% chance to beat the control" or there is "XX% probability that variant outperforms the control". What is left out when one is presented with such a number is "based on a certain prior probability *pi; and assuming our statistical model is adequate".

Bayesian-type causal inference is induction by enumeration as it relies on large sample size approximations contrary to frequentist inference - model-based induction which relies on finite sampling distributions. The two approaches share the same probabilistic assumptions but only frequentist induction makes effective use of them to derive the sampling distribution of the parameter of interest which can be used for inference for any sample size larger than one. In contrast, efficiency in a Bayesian setting is considered in terms of optimality of a decision across all possible values, instead of relative to the true value. Thus, even when a Bayesian and a freqentist approach produce the same asymptotic result, only the latter can provide finite-sample guarantees.

Due to the "updating" property of Bayesian inference some Bayesians believe in what is dubbed the "stopping rule principle" according to which one can simply evaluate more and more data as it gathers without the necessity for the equivalent of a p-value adjustment for peeking. However, this is only true if the statistical model used to construct the prior is adequate vis-a-vis the data, meaning that it must also model the optional stopping behavior. This automatically bars the use of non-informative or default priors in sequential testing which effectively invalidates any claims of superiority based on that principle since a Bayesian analysis has to take measures similar to frequentist sequential testing if the reported statistics are to be of any value. The same argument counters claims that Bayesian methods can be applied by ignoring the sampling plan entirely.

A major non-intuitive property of Bayesian inference is that it forgoes the existence of a true state of nature and instead posits a probabilistic distribution for it. The data is considered infallible and fixed while the true state of nature is a random variable. In an A/B testing scenario this would be the assumption that data are as they are while the conversion rate can be anything from zero to one for any given time period t. This is opposited to frequentist inference which considers the conversion rate at time period t fixed but unknown.

Bayesian methods are generally not concerned about a type I error or a type II error rates. Such error rates are completely ignored and thus cannot be controlled in a Bayesian analysis. Instead of power analysis one can perform planning for precision, calculating the rate at which an analysis goal is achieved based on a prior distribtion.