The previous section showed how it is possible to learn an optimal
policy without knowing the models T(s,a,s') or R(s,a) and without
even learning those models en route. Although many of these methods
are guaranteed to find optimal policies eventually and use very little
computation time per experience, they make extremely inefficient use
of the data they gather and therefore often require a great deal of
experience to achieve good performance. In this section we still
begin by assuming that we don't know the models in advance, but we
examine algorithms that do operate by learning these models. These
algorithms are especially important in applications in which
computation is considered to be cheap and real-world experience
costly.