A few technical details

The model uses n-grams, with trigrams as the default. Trigrams produce overly faithful output on sparse data, so the
output mode includes a "backoff" component (turned on by default, disable it with --no-back-off) which tries to
mix things up a little: whenever the current n-gram has only a single choice for its prediction, it tries instead an
(n-1)-gram, and so on until it either falls back to bigrams or gets to make a non-trivial choice.

In order to support this backoff system the model stores quite a lot of data: if you're working with 4-grams it will
store every bigram, trigram and 4-gram in your corpus. The storage format is extremely dumb, so don't be surprised if
the file size goes up pretty rapidly. You should expect something like twice the size of your corpus for bigrams,
five times corpus size for trigrams (3 times for the trigrams plus 2 for the bigrams that support backoff), nine times
corpus size for 4-grams, etc. (These are upper limits; actual size will depend on how much repetition there is in your
corpus.) The good news is, on Twitter data even trigrams seems to be overkill, so very likely this will never become an
issue for you.