Abstract: Motivation: Attentional allocation is often studied by isolating a small subset of bottom-up or top-down influences in highly-controlled environments. Other studies provide phenomenological descriptions of overt attention in highly-complex environments (e.g., Land & McLeod 2000, Nat. Neuro.). The present study attempts to find the middle ground by investigating the temporal interplay between bottom-up and top-down influences on attentional allocation in dynamic natural scenes. Methods: A set of heterogeneous video clips was cut into clippets (M=2s), which were scrambled and reassembled into MTV-style clips. Subjects were instructed to ``follow the main actors and actions.'' Eye positions were recorded using an eye-tracker, yielding a total of 1.35 million samples, which were segmented into saccades, blinks, and fixation/smooth pursuit periods. Saccade target selection was compared to predictions made by a computational model of saliency-based attention capture (Itti & Koch 2000, Vis. Res.). Results were compared to those of a twin experiment that employed the same methodology, but used the unscrambled clips as stimuli. Results: The ratio average/max saliency calculated by the model at human saccade targets was 0.47 compared with 0.25 at random saccade targets. The null hypothesis that human and random samples were drawn from the same distribution was rejected by sign and wilcoxon tests (p<10E-10 for both). Identical results were obtained for both the scrambled and unscrambled stimuli. Conclusions: Eliminating visual context effects beyond the first 2s by temporally scrambling dynamic natural scenes did not increase the relative weight of bottom-up influences on attentional allocation. A possible interpretation of our results is that the relative weight of top-down influences on attentional allocation in natural viewing conditions does not change with presentation time (at least beyond the first 2s).