No, not gumballs

We often want to characterize probabilistic models in discrete situations. The Gumbel trick allows us to estimate as associated partition function $Z$ with relative ease. At a high level, finding $Z$ or even $\ln Z$ is very difficult; however, we can add some noise and compute the maximum a posteriori (MAP) more easily through approximation methods. If we repeat this process enough times, we get a reliable estimate of $Z$.

Find the MAP of this perturbed value over all $x \in \mathcal{X}$. Call this value $z_i$

Repeat steps 1 and 2 multiple times and then collect the mean $\hat{Z} \approx Z$

But why?

We want to prove the supposedly useful Gumbel trick then using the Perturb-and-MAP method, specifically

where $\phi(x)$ has been defined as the potentials and $\gamma \sim \text{Gumbel}(-c)$ where $c$ is the Euler-Mascheroni constant.

Because the mean of $\text{Gumbel}(\mu)$ is $\mu + c$, we can show that $\ln Z$ and then $Z$ are recoverable.

A brief Gumbal interlude

The Gumbel distribution is traditionally used to model the maxima of already extreme events. For example, what will be worst earthquake next year given the measurements of the worst earthquakes in the past 10 years in San Francisco?

A variable $X$ drawn from $\text{Gumbel}(\mu)$ has the probability distribution

The actual proof

We want to find the value of $x$ that maximizes $\phi(x) + \gamma(x)$. Thinking in terms of the CDF, we want all values of $x \in \mathcal{X}$ to produce smaller or equal values

The first equality follows from multiplying the Gumbel CDF $F(t)$ of $\gamma(x)$ of all possible values to capture the maximum. The second equality comes from expanding out the Gumbel CDF. The third equality consolidates the potential functions such that $Z = \sum_{x \in \mathcal{X}} \phi(x)$. The fourth equality sticks the $\ln Z$ back into the $\exp$ function. The last equality compresses the probability back into the Gumbel CDF, except set at a different location. $\blacksquare$