The world contains information over multiple timescales. For example, we must combine sequences of syllables to perceive a word, and sequences of words to comprehend a sentence. How does the brain process information over multiple timescales? Previous studies have demonstrated that higher-order brain regions are sensitive to temporal context on longer scales, and that they also express stable activity states over the duration of an event. We set out to model these neural phenomena using a hierarchical temporal auto-encoder (HTA). When augmented with mechanisms for a contextual reset, the HTA successfully reproduces neural phenomena. The HTA also generates a prediction regarding context construction: that low-level regions establish a new context rapidly, while higher order regions establish new context representations more gradually. We confirmed this empirical prediction by applying inter-subject pattern correlation to fMRI responses of sentences heard in different temporal contexts. Overall, we propose that a hierarchy of temporal auto-encoders is a feasible model of temporal information processing in the cortical hierarchy.