Note that I haven't mentioned whether we are doing binary alphabet or large alphabet or any other practical issues, because it
doesn't affect the algorithm in a theoretical way.

While I'm at it, let me take the chance to mark up the PPM pseudocode with where "modern" PPM differs from "classical" PPM :
(by "modern" I mean 2002/PPMii and by "classical" I mean 1995/"before PPMZ").

also make non-continuous contexts like
skip contexts : AxBx
contexts containing only a few top bits from each byte
contexts involving a word dictionary
contexts involving current position in the stream

do "partial exclusion" like PPMii, do full update down to coded context
and then reduced update to parents to percolate out information a bit
do "inheritance" like PPMii - novel contexts updated from parents
do "fuzzy updates" - don't just update your context but also neighbors
which are judged to be similar in some way