Will specifying the target region of a scene immediately before a search task improve search efficiency? To answer this question we had subjects search aerial images for a UFO target, which appeared hovering over one of five scene regions: water, fields, foliage, roads, or buildings. Aerial images were used to sever learned spatial relationships between a scene and its regions (e.g., a road could appear at any position and orientation in a scene). Prior to search scene onset, subjects were either told the scene region where the target could be found (specified condition) or were asked to search for the target in the absence of region information (unspecified condition). The absolute locations of targets and target regions within scenes were unpredictable. Search times were faster and fewer eye movements were needed to acquire targets when the target region was specified. Subjects also tended to fixate the cued region sooner and distributed their fixations disproportionately in this region. A lesser (but above chance) preference to fixate in the target region extended to the unspecified condition, which we attributed to appearance-based target guidance after ruling out guidance by low-level feature contrast. Importantly, the search differences observed between specified and unspecified conditions cannot be explained by either bottom-up saliency-based models or top-down models that use target appearance to guide search. Nor can Bayesian approaches that rely on learned spatial associations between a scene and its regions explain this cuing effect, as these spatial relationships varied unpredictably from trial to trial. Rather, we interpret these differences as evidence for the use of highly flexible referential scene constraints to confine search to the cued scene region, similar to the constraints commonly used in spoken discourse. Such constraints require the modification of existing theories to include segmentation processes that can rapidly bias search to cued regions.