To study the process, we observe the behavior of the random process for some
time, collecting a large number of samples . In the example we
have been considering, each sample would consist of a phrase x containing the
words surrounding in, together with the translation y of in which
the process produced. For now we can imagine that these training samples have
been generated by a human expert who was presented with a number of random
phrases containing in and asked to choose a good translation for each.

We can summarize the training sample in terms of its empirical probability
distribution , defined by

Typically, a particular pair will either not occur at all in the
sample, or will occur at most a few times.