When the setting for random variables is on complex projective space or projective Hilbert space, geometrized with the Fubini–Study metric, the theory of quantum mechanics and more generally quantum field theory results. In these theories, the partition function is heavily exploited in the path integral formulation, with great success, leading to many formulas nearly identical to those reviewed here. However, because the underlying measure space is complex-valued, as opposed to the real-valued simplex of probability theory, an extra factor of i appears in many formulas. Tracking this factor is troublesome, and is not done here. This article focuses primarily on classical probability theory, where the sum of probabilities total to one.

Given a set of random variablesXi{\displaystyle X_{i}} taking on values xi{\displaystyle x_{i}}, and some sort of potential function or HamiltonianH(x1,x2,…){\displaystyle H(x_{1},x_{2},\dots )}, the partition function is defined as

The function H is understood to be a real-valued function on the space of states {X1,X2,⋯}{\displaystyle \{X_{1},X_{2},\cdots \}}, while β{\displaystyle \beta } is a real-valued free parameter (conventionally, the inverse temperature). The sum over the xi{\displaystyle x_{i}} is understood to be a sum over all possible values that each of the random variables Xi{\displaystyle X_{i}} may take. Thus, the sum is to be replaced by an integral when the Xi{\displaystyle X_{i}} are continuous, rather than discrete. Thus, one writes

When H is infinite-dimensional, then, for the above notation to be valid, the argument must be trace class, that is, of a form such that the summation exists and is bounded.

The number of variables Xi{\displaystyle X_{i}} need not be countable, in which case the sums are to be replaced by functional integrals. Although there are many notations for functional integrals, a common one would be

A common, useful modification to the partition function is to introduce auxiliary functions. This allows, for example, the partition function to be used as a generating function for correlation functions. This is discussed in greater detail below.

The role or meaning of the parameter β{\displaystyle \beta } can be understood in a variety of different ways. In classical thermodynamics, it is an inverse temperature. More generally, one would say that it is the variable that is conjugate to some (arbitrary) function H{\displaystyle H} of the random variables X{\displaystyle X}. The word conjugate here is used in the sense of conjugate generalized coordinates in Lagrangian mechanics, thus, properly β{\displaystyle \beta } is a Lagrange multiplier. It is not uncommonly called the generalized force. All of these concepts have in common the idea that one value is meant to be kept fixed, as others, interconnected in some complicated way, are allowed to vary. In the current case, the value to be kept fixed is the expectation value of H{\displaystyle H}, even as many different probability distributions can give rise to exactly this same (fixed) value.

For the general case, one considers a set of functions {Hk(x1,⋯)}{\displaystyle \{H_{k}(x_{1},\cdots )\}} that each depend on the random variables Xi{\displaystyle X_{i}}. These functions are chosen because one wants to hold their expectation values constant, for one reason or another. To constrain the expectation values in this way, one applies the method of Lagrange multipliers. In the general case, maximum entropy methods illustrate the manner in which this is done.

Some specific examples are in order. In basic thermodynamics problems, when using the canonical ensemble, the use of just one parameter β{\displaystyle \beta } reflects the fact that there is only one expectation value that must be held constant: the free energy (due to conservation of energy). For chemistry problems involving chemical reactions, the grand canonical ensemble provides the appropriate foundation, and there are two Lagrange multipliers. One is to hold the energy constant, and another, the fugacity, is to hold the particle count constant (as chemical reactions involve the recombination of a fixed number of atoms).

with the angle brackets ⟨Hk⟩{\displaystyle \langle H_{k}\rangle } denoting the expected value of Hk{\displaystyle H_{k}}, and E[]{\displaystyle \mathrm {E} [\;]} being a common alternative notation. A precise definition of this expectation value is given below.

Although the value of β{\displaystyle \beta } is commonly taken to be real, it need not be, in general; this is discussed in the section Normalization below. The values of β{\displaystyle \beta } can be understood to be the coordinates of points in a space; this space is in fact a manifold, as sketched below. The study of these spaces as manifolds constitutes the field of information geometry.

where the sum over s is a sum over some subset of the power setP(X) of the set X={x1,x2,…}{\displaystyle X=\lbrace x_{1},x_{2},\dots \rbrace }. For example, in statistical mechanics, such as the Ising model, the sum is over pairs of nearest neighbors. In probability theory, such as Markov networks, the sum might be over the cliques of a graph; so, for the Ising model and other lattice models, the maximal cliques are edges.

The fact that the potential function can be written as a sum usually reflects the fact that it is invariant under the action of a group symmetry, such as translational invariance. Such symmetries can be discrete or continuous; they materialize in the correlation functions for the random variables (discussed below). Thus a symmetry in the Hamiltonian becomes a symmetry of the correlation function (and vice versa).

This symmetry has a critically important interpretation in probability theory: it implies that the Gibbs measure has the Markov property; that is, it is independent of the random variables in a certain way, or, equivalently, the measure is identical on the equivalence classes of the symmetry. This leads to the widespread appearance of the partition function in problems with the Markov property, such as Hopfield networks.

can be interpreted as a likelihood that a specific configuration of values (x1,x2,…){\displaystyle (x_{1},x_{2},\dots )} occurs in the system. Thus, given a specific configuration (x1,x2,…){\displaystyle (x_{1},x_{2},\dots )},

is the probability of the configuration (x1,x2,…){\displaystyle (x_{1},x_{2},\dots )} occurring in the system, which is now properly normalized so that 0≤P(x1,x2,…)≤1{\displaystyle 0\leq P(x_{1},x_{2},\dots )\leq 1}, and such that the sum over all configurations totals to one. As such, the partition function can be understood to provide a measure (a probability measure) on the probability space; formally, it is called the Gibbs measure. It generalizes the narrower concepts of the grand canonical ensemble and canonical ensemble in statistical mechanics.

There exists at least one configuration (x1,x2,…){\displaystyle (x_{1},x_{2},\dots )} for which the probability is maximized; this configuration is conventionally called the ground state. If the configuration is unique, the ground state is said to be non-degenerate, and the system is said to be ergodic; otherwise the ground state is degenerate. The ground state may or may not commute with the generators of the symmetry; if commutes, it is said to be an invariant measure. When it does not commute, the symmetry is said to be spontaneously broken.

Conditions under which a ground state exists and is unique are given by the Karush–Kuhn–Tucker conditions; these conditions are commonly used to justify the use of the Gibbs measure in maximum-entropy problems.[citation needed]

The values taken by β{\displaystyle \beta } depend on the mathematical space over which the random field varies. Thus, real-valued random fields take values on a simplex: this is the geometrical way of saying that the sum of probabilities must total to one. For quantum mechanics, the random variables range over complex projective space (or complex-valued projective Hilbert space), where the random variables are interpreted as probability amplitudes. The emphasis here is on the word projective, as the amplitudes are still normalized to one. The normalization for the potential function is the Jacobian for the appropriate mathematical space: it is 1 for ordinary probabilities, and i for Hilbert space; thus, in quantum field theory, one sees itH{\displaystyle itH} in the exponential, rather than βH{\displaystyle \beta H}. The partition function is very heavily exploited in the path integral formulation of quantum field theory, to great effect. The theory there is very nearly identical to that presented here, aside from this difference, and the fact that it is usually formulated on four-dimensional space-time, rather than in a general way.

The partition function is commonly used as a generating function for expectation values of various functions of the random variables. So, for example, taking β{\displaystyle \beta } as an adjustable parameter, then the derivative of log⁡(Z(β)){\displaystyle \log(Z(\beta ))} with respect to β{\displaystyle \beta }

The above notation is strictly correct for a finite number of discrete random variables, but should be seen to be somewhat 'informal' for continuous variables; properly, the summations above should be replaced with the notations of the underlying sigma algebra used to define a probability space. That said, the identities continue to hold, when properly formulated on a measure space.

The points β{\displaystyle \beta } can be understood to form a space, and specifically, a manifold. Thus, it is reasonable to ask about the structure of this manifold; this is the task of information geometry.

Multiple derivatives with regard to the Lagrange multipliers gives rise to a positive semi-definite covariance matrix

where we've written P(x){\displaystyle P(x)} for P(x1,x2,…){\displaystyle P(x_{1},x_{2},\dots )} and the summation is understood to be over all values of all random variables Xk{\displaystyle X_{k}}. For continuous-valued random variables, the summations are replaced by integrals, of course.

By introducing artificial auxiliary functions Jk{\displaystyle J_{k}} into the partition function, it can then be used to obtain the expectation value of the random variables. Thus, for example, by writing

Multiple differentiations lead to the connected correlation functions of the random variables. Thus the correlation function C(xj,xk){\displaystyle C(x_{j},x_{k})} between variables xj{\displaystyle x_{j}} and xk{\displaystyle x_{k}} is given by:

then partition function can be understood to be a sum or integral over Gaussians. The correlation function C(xj,xk){\displaystyle C(x_{j},x_{k})} can be understood to be the Green's function for the differential operator (and generally giving rise to Fredholm theory). In the quantum field theory setting, such functions are referred to as propagators; higher order correlators are called n-point functions; working with them defines the effective action of a theory.