Since the objective function is unknown, the Bayesian strategy is to treat it as a random function and place a prior over it.
The prior captures our beliefs about the behaviour of the function.
After gathering the function evaluations, which are treated as data, the prior is updated
to form the posterior distribution over the objective function.
The posterior distribution, in turn, is used to construct
an acquisition function (often also referred to as infill sampling criteria) that determines what the next query point should be.

Examples of acquisition functions include probability of improvement,
expected improvement, Bayesian expected losses, upper confidence bounds (UCB), Thompson sampling
and mixtures of these.[5] They all trade-off exploration and exploitation so as to minimize the number of function queries. As such, Bayesian optimization is well suited for functions that are very expensive to evaluate.