Temporal ventriloquism can been seen in the auditory capture of visual Ternus apparent motion, where the percepts of apparent motion (“element motion” vs. “group motion”) are modulated by two sounds presented close in time to the visual events. Here we presented a sequence of tones in association with the Ternus display and examined the temporal sequential effect in audiovisual ventriloquism. In Experiment 1, the Ternus display was synchronized with the first, middle and last two tones in the sequence, which had 6 beeps presented rhythmically. The threshold of perceiving “group motion” decreased over its temporal location in the auditory sequence. In Experiment 2, the tone sequences contained either 4 or 6 beeps. The sequential effect was observed in both sequences but the effect size was determined mainly by the relative temporal position of the visual display. Experiment 3 employed the 6-beep tone sequences with either regular or irregular tempo. The sequential effect was evident in both sequences, but was larger for the irregular. Experiment 4 used the same design as Experiment 3 but with one extra beep and one-to-three extra beeps at the sequence-beginning and sequence-ending positions, respectively. The sequential effect was replicated in the irregular sequence but diminished in the rhythmic sequence. These results can be accounted by a time averaging model, according to which the observer builds up time-based expectancy based on the averaged time intervals between beeps; it is relatively easy to separate the two frames in the Ternus display when the time interval between the two visual frames matches this expectancy. The more the beeps are heard, the stronger the expectancy is, and the lower threshold of “group motion”. Moreover, because time averaging is delayed when fewer beeps in irregular sequence are heard, the threshold is higher at earlier positions, magnifying the temporal sequential effect.