Statistical modeling addresses the problem of modeling the behavior of a random
process. In constructing this model, we typically have at our disposal a
sample of output from the process. From the sample, which constitutes an incomplete
state of knowledge about the process, the modeling problem is to parlay this
knowledge into a succinct, accurate representation of the process. We can then
use this representation to make predictions of the future behavior of the
process.

Exponential models have proven themselves handy in this arena, and
for that reason have earned a place in the toolbox of every statistical
physicist since the turn of the century. Within the broad class of exponential
models exists a family of distributions, maximum entropy models, with some
interesting mathematical and philosophical properties. Though the concept of
maximum entropy can be traced back along multiple threads to Biblical
times, only recently have computers become powerful enough to permit the
widescale application of this concept to real world problems in statistical
estimation and pattern recognition.

The following pages discuss a method for statistical modeling based on maximum
entropy, with a particular on questions of interest in natural language
processing (NLP). Extensive results and benchmarks are provided, as well as a
number of practical algorithms for modeling conditional data using maxent. A
connection between conditional maxent models and Markov random fields--a
popular modeling technique in computer vision--is drawn in the final section.

Starting from a set of data, the algorithms discussed in the following pages
can automatically extract a set of relationships inherent in the data, and then
combine these rules into a model of the data which is both accurate and
compact. For instance, starting from a corpus of English text with no
linguistic knowledge whatsoever, the algorithms can automatically induce a set
of rules for determining the appropriate meaning of a word in context. Since
this inductive learning procedure is computationally taxing, we are also
obliged to provide a set of heuristics to ease the computational burden.

Though the theme of this discussion is NLP--trying to quantify the relations
among words in everyday human speech and writing--absolutely nothing in this
document is at all particular to NLP. Maxent models have been applied with
success in astrophysics and medicine, among other fields.