Conditional control in visual selection

Abstract

Attention and eye movements provide a window into the selective processing of visual information. Evidence suggests that selection is influenced by various factors and is not always under the strategic control of the observer. The aims of this tutorial review are to give a brief introduction to eye movements and attention and to outline the conditions that help determine control. Evidence suggests that the ability to establish control depends on the complexity of the display as well as the point in time at which selection occurs. Stimulus-driven selection is more probable in simple displays than in complex natural scenes, but it critically depends on the timing of the response: Salience determines selection only when responses are triggered quickly following display presentation, and plays no role in longer-latency responses. The time course of selection is also important for the relationship between attention and eye movements. Specifically, attention and eye movements appear to act independently when oculomotor selection is quick, whereas attentional processes are able to influence oculomotor control when saccades are triggered only later in time. This relationship may also be modulated by whether the eye movement is controlled in a voluntary or an involuntary manner. To conclude, we present evidence that shows that visual control is limited in flexibility and that the mechanisms of selection are constrained by context and time. The outcome of visual selection changes with the situational context, and knowing the constraints of control is necessary to understanding when and how visual selection is truly controlled by the observer.

Keywords

Eye movements Visual attention

Visual selection is necessary to deal with the enormous amount of information that is presented to our visual system. An important question that has been the topic of many studies concerns the extent to which this visual selection is driven automatically by the properties in the visual field or is voluntarily guided by the intentions and strategies of an observer (e.g., Folk & Remington, 1998; Folk, Remington, & Johnston, 1992; Godijn & Theeuwes, 2002; Yantis, 2000; Yantis & Jonides, 1990). Although salient events tend to automatically capture attention and eye movements, this is not always the case. Whereas some have argued that stimulus-driven processes dominate visual selection (e.g., Nothdurft, 2002; Theeuwes, 1992, 1994; Theeuwes, Atchley, & Kramer, 2000), others have argued that goal-directed processes dominate it instead (e.g., Bacon & Egeth, 1994; Folk & Remington, 1998; Folk et al., 1992). More recently, some consensus has been reached on the idea that both stimulus- and goal-driven factors ultimately interact to determine selection (e.g., Connor, Egeth, & Yantis, 2004; Corbetta & Shulman, 2002; Sawaki & Luck, 2010; van Zoest, Donk, & Theeuwes, 2004). This relationship, however, is complicated by evidence showing that many other factors influence the interaction between automatic and voluntary selection. For example, the prioritization of visual information may be influenced by prior experience, learning, statistical regularities, motivation, and reward history (e.g., B. A. Anderson, Laurent, & Yantis, 2011b; Chun & Jiang, 2003; Cosman & Vecera, 2014; Hickey, Chelazzi, & Theeuwes, 2010; Kadel, Feldmann-Wüstefeld, & Schubö, 2017; Leber, Kawahara, & Gabari, 2009; Paoletti, Weaver, Braun, & van Zoest, 2015). Since these effects may be unrelated to the goals of the observer or the physical salience of items in the visual field, these factors are difficult to order within a strict stimulus- versus goal-driven dichotomy, which has led some to argue that the taxonomy of bottom-up versus top-down is inadequate (Awh, Belopolsky, & Theeuwes, 2012).

While there is no doubt that the relationship between the different mechanisms of control is complicated and probably affected by a large variety of factors, additional constraints limit whether stimulus- or goal-driven selection can occur in the first place. In most theories of selection, however, these constraints of control are underspecified. For example, it is typically assumed that bottom-up and top-down control are equally available to a system at any moment in time and are integrated in a common priority map to determine the final locus of selection (e.g., Awh et al., 2012; Wolfe, 1994). However, stimulus- and goal-driven processes are not continuously available to bias selection. The establishment of these mechanisms depends on time as well as visual context, and the combination of these constraints shapes the development of control (e.g., Donk & van Zoest, 2008; van Zoest & Donk, 2008; van Zoest et al., 2004). Moreover, the relationship between attention and eye movements may further complicate the balance. Although covert and overt control of selection are often treated in the same vein, what is true for attention is not necessarily true for oculomotor control. Constraints in the relation between attention and eye movement help define when these mechanism are associated or not associated (e.g., Belopolsky & Theeuwes, 2012; Smith & Schenk, 2012). The aims of this review are to outline these constraints in control of selection as well as to discuss how these limitations may delineate the link between attention and eye movements. Let us start with a brief introduction of how eye movements are typically used to measure selection performance.

About selection via eye movements

Eyes are critically important in determining visual information processing. What you see depends first and foremost on the position of your head and eyes in space. These parameters determine the scope of the visual field, and therefore determine the content of the first stage of information processing (e.g., Henderson, 1993; Kowler, 2011; Liversedge & Findlay, 2000). In contrast to this relatively rudimentary and overt means to preselect visual information, a secondary, more subtle mechanism of selection is covert—that is, operating without explicit motor movements of the head or eyes; this mechanism is typically referred to as attention. Because the potential beneficial effects of visual attention on information processing are ultimately constrained by the position of the head and eyes, covert attention may be considered to provide a supplementary means to sample and select a subset of information from the environment (Findlay & Gilchrist, 2003). However, at the same time, because visual attention is thought to be closely coupled to eye movements (see the Attention and Eye Movements section for a discussion; e.g., Corbetta et al., 1998; Rizzolatti, Riggio, Dascola, & Umiltà, 1987), eye movements are thought to provide a measure of the spatial location of visual attention (e.g., Liversedge & Findlay, 2000; Zelinsky, Rao, Hayhoe, & Ballard, 1997; Zelinsky & Sheinberg, 1997; however, see Smith & Schenk, 2012; Võ, Aizenman, & Wolfe, 2016; Weaver, van Zoest, & Hickey, 2017).

Eye movements, saccadic movements, saccades, or oculomotor responses are rapid changes in eye position that occur three or four times each second. The reason that people make eye movements is to allow the information that reaches the eye to be brought into high-resolution foveal vision. The fovea is the central area of the retina, which is especially important because it has a denser concentration of photoreceptors than the periphery of the retina, and therefore provides maximal acuity. Due to a decrease in the density of photoreceptors away from the center, visual acuity decreases as eccentricity from the center increases (e.g., Hirsch & Curcio, 1989). Thus, the saccadic system rotates the eyes such that critical information can be brought to the fovea. During the more or less stationary periods between saccades (average of around 225 ms), information is acquired for further processing (Viviani, 1989). By using eye movement trackers, it is possible to measure the course of eye movements, study specifically how eye movements explore the visual world, and thus investigate the fundamental mechanisms of visual selection. Several commercial eye movement trackers are on the market. The primary difference between trackers tends to be their temporal and/or spatial resolution, as well as the relative mobility a tracker provides—that is, the degree to which it is possible to measure eye movements in active and real-world situations that are not limited to the desktop computer in a lab environment. There are also ways to build an eyetracker using a webcam (e.g., Mantiuk, Kowalik, Nowosielski, & Bazyluk, 2012).

A number of standard dependent variables are often reported in eye movement research; these include the saccade latency, landing position, and fixation duration. The saccade latency is often taken as the time between the presentation of a stimulus and the initial movement of the eye. This measure is typically expressed in milliseconds, and usually varies between 150 and 400 ms. Whether saccade latencies are short or long depends on a number of different factors. For example, most often small eye movements—that is, those with small distances from one fixation to another (subsequent) one—have a shorter saccade latencies than large eye movements—that is, those covering larger distances (see, e.g., W. Becker, 1989; Zambarbieri, Beltrami, & Versino, 1995). Eye movements to conspicuous locations tend to have shorter latencies than eye movements to nonconspicuous locations, and incorrect eye movements to an irrelevant but salient unique stimulus tend to have shorter latencies than correct eye movements to an intended target (see, e.g., van Zoest et al., 2004). Landing position concerns the spatial location that is selected by the eye, which can tell researchers something about the accuracy of saccadic selection. Specifically, the distance of the landing position of the eye from a given target can vary, such that the eyes can land closer or farther away from an intended saccadic target. Eyes typically initially undershoot the intended location—that is, land short of a specific target. This happens in reading, during which the eyes typically land at a position slightly to the left of the center of the word (McConkie, Kerr, Reddix, & Zola, 1988), but this tendency also exists in eye movements to objects (Henderson, 1993; Zelinsky et al., 1997). Secondary, corrective saccades typically tend to fix initial landing errors such that the eyes eventually land at the center of a word or object. Sometimes the landing position of the eyes does not at all correspond to the intended saccadic location; for example, the eyes may get “captured” by an irrelevant salient stimulus and process this unintended selected stimulus before a (correct) redirection to the location of the target occurs (e.g., Theeuwes, Kramer, Hahn, & Irwin, 1998). This may happen, for example, in the case of a sudden onset of light, but also when a static irrelevant salient feature stands out from its surrounds (see also the section on Eye Movement Control and Saccadic Latency). The fixation duration is the time the eyes are stationary at a certain location before moving on to the next location. Fixations can be very short, some less than 100 ms, in which case there is little uncertainty about the landing location of the next eye movement, suggesting that the subsequent eye movement may have been programmed in advance (e.g., Godijn & Theeuwes, 2003; McPeek & Keller, 2001). Fixation durations can also be longer, which might be the case if the target location of the saccade is unpredictable and the eye movement has to be programmed from “scratch.” Fixation durations also tend to be longer when information is difficult to discriminate and incongruent with the context (e.g., Henderson, Weeks, & Hollingworth, 1999). Individual fixation durations are also influenced by the observers’ task; for example, fixation durations are longer during a scene memorization task than during a visual search task for a specific target (e.g., Henderson et al., 1999; Mills, Hollingworth, Van der Stigchel, Hoffman, & Dodd, 2011).

In addition, it is important to point out that the saccadic eye movement itself takes time. Saccades have a certain duration—that is, the time that they are in flight from the saccadic start position to the saccadic end position in space. The saccade duration is correlated with the amplitude of the eye movement, with larger eye movements tending to take longer (Findlay & Gilchrist, 2003). Saccadic eye movements are often said to be ballistic—that is, to constitute a movement whose destination is predetermined at the outset. This implies that the oculomotor system cannot respond to subsequent changes in the position of the target during the course of the eye movement (Findlay & Gilchrist, 2003). However, although this may be the case for the initial direction of the eye movement, evidence suggests that the later part of the eye movement may be influenced, though to a limited degree, by feedback (Quaia, Lefèvre, & Optican, 1999; Sparks, 2002). Evidence for this kind of feedback can be found in saccade trajectories when one considers the precise path of the eye movement from the start to the end point. Evidence suggests that eye movements rarely take the shortest route from point A to point B, but reveal a certain degree of curvature that depends on environmental and situational factors (e.g., Doyle & Walker, 2001; Erkelens & Sloot, 1995; Sheliga, Riggio, Craighero, & Rizzolatti, 1995; Van der Stigchel, Meeter, & Theeuwes, 2006). Accordingly, initial saccades may deviate toward or away from a salient distractor location but may curve back toward the target in the final stages of the eye movement. It has been hypothesized that the redirection back to the target may be the result of feedback processes that control the saccade trajectory “online,” enabling small corrections to be made (Walker & McSorley, 2008).

Eye movement control

One of the greatest benefits of the study of eye movements is that it provides one direct measure of overt spatial selection. If a salient distractor interferes with the correct selection of a target, this is instantly represented in the eye movement data as an incorrect saccadic eye movement to the distractor (e.g., Theeuwes et al., 1998; Theeuwes, Kramer, Hahn, Irwin, & Zelinsky, 1999). In contrast, in studies of covert attention one typically has to infer the relative contributions of stimulus- and goal-driven processes. Specifically, the presence of a salient irrelevant distractor may draw attention away from the target. This additional shift of attention will likely delay manual response times to the correct target relative to a condition in which no salient distractor is presented, so that the target can be selected without interruption (e.g., Theeuwes, 1992). Thus, in studies on covert attention, slow manual responses to the target are typically used to infer interference from an irrelevant distractor.

Eye movement control depends on saccadic latency

Looking at how stimulus- and goal-driven control come about in eye movements, striking similarities across paradigms suggest idiosyncratic signatures in time for stimulus-driven and goal-directed selection. Specifically, saccades driven by physical salience tend to be much faster than those guided by current goals. For example, in the oculomotor capture paradigm (see Fig. 1A) participants are required to make an eye movement to a prespecified target that is unique in color (Godijn & Theeuwes, 2002; Theeuwes et al., 1998). In the particular experiment shown in the figure, participants are presented with six gray circles arrayed on the circumference of an imaginary circle. After a specified amount of time, all but one circle changes color to red, and participants are instructed to make an eye movement to the gray circle, the only element that has not changed color. On half of the trials, concurrently with the color change of the distractors, an additional red circle is added to the display. This onset distractor is never relevant to the task. Nevertheless, the results show that the eyes are drawn toward the irrelevant onset distractor on one third of all trials. Overall, the results show that the appearance of the onset distractor significantly interferes with the planning and execution of a goal-directed eye movement to the color singleton (Godijn & Theeuwes, 2002). These results show that even though participants were instructed to make an eye movement to the color singleton target, the eyes involuntarily and incorrectly end up at the location of the new object (“oculomotor capture”). Theeuwes et al. (1998) concluded that visual selection is initially determined by the stimulus properties in the visual field (see also Theeuwes, 1992). Only after inhibition of the irrelevant event are the eyes able to move correctly to the target. Critically, when oculomotor capture happens in the context of visual search, the saccadic latencies of eye movements that are incorrectly directed to the irrelevant salient distractors are structurally much faster than eye movements that are correctly directed toward the target (Godijn & Theeuwes, 2002; Theeuwes et al., 1999; see also Hunt, von Mühlenen, & Kingstone, 2007).

(A) Example of the additional-singleton paradigm in which the additional distractor is presented as an abrupt onset. The saccade in this example is “captured” by the salient distractor. (B) Additional singleton with static distractor. The distractor and target here are equally salient. The saccade in this example is first incorrectly drawn the irrelevant distractor, before the eyes proceed correctly to the target. (C) Illustration of the antisaccade task. (D) Illustration of saccade deviations toward and away from the distractor.

A comparable difference between saccade latencies in stimulus- and goal-directed movements is found in the antisaccade task. In this task, observers are required to make an eye movement in the direction opposite from that in which the visual target is located (Hallet, 1978; Hallet & Adams, 1980). See Fig. 1C. The results typically show that although observers are able to make antisaccades, they often make prosaccades—that is, incorrect saccades in the direction of the visual target. The results show that the saccadic latencies for prosaccades are lower than those for antisaccades (Everling, Dorris, & Munoz, 1998; Massen, 2004; Mokler & Fischer, 1999; Munoz & Everling, 2004; Olk & Kingstone, 2003). Thus, stimulus-driven saccade latencies are significantly lower than the latencies of goal-driven saccades (Walker, Walker, Husain, & Kennard, 2000).

The characteristics of saccadic trajectories similarly depend on the saccadic latency. See Fig. 1D. Deviation in the direction of—toward—the distractor is generally found when saccade latencies are short (less than 200 ms), whereas deviations opposite—away from—the distractor location are observed when saccadic latencies are longer (e.g., Laidlaw & Kingstone, 2010; McSorley, Haggard, & Walker, 2005, 2006; Mulckhuyse, Van der Stigchel, & Theeuwes, 2009; Walker & McSorley, 2008). Saccadic trajectory deviations are often explained in terms of competitive interactions between saccadic movement vectors in a spatiotopic activation map (e.g., McPeek, Han, & Keller, 2003; Tipper, Howard, & Houghton, 2000). Accordingly, early deviation toward is typically found when top-down inhibition had not yet been established at the time the eye movement was programmed. Later in time, when inhibition is able to suppress the irrelevant activation at the distractor location, the saccade may be directed away from the inhibited location (McSorley et al., 2006; Van der Stigchel, 2010; Walker, McSorley, & Haggard, 2006).

The studies described here all point to a critical contribution of the time course of selection and oculomotor performance. In our work, we used a modified oculomotor capture task (e.g., Donk & van Zoest, 2008; van Zoest & Donk, 2008; van Zoest et al., 2004), in which the stimuli were simple lines at various orientations. The stimulus salience of an additional distractor was manipulated by varying the orientation of that distractor (e.g., van Zoest & Donk, 2006; van Zoest et al., 2004) or the color of the elements (van Zoest & Donk, 2005, 2008). See Fig. 1B. Note that salience in these studies is defined by the uniqueness of features that are static; this may be considered different than the salience of abrupt onsets or moving stimuli, which may be argued to have much greater power to summon attention (e.g., Jonides & Yantis, 1988; Yantis & Jonides, 1990). However, whereas the salience of abrupt or motion onsets seems time-dependent by definition (see also van Zoest, Heimler, & Pavani, 2017), it actually turns out that static salience shows a similar time-dependent influence. In the studies of van Zoest and colleagues, the target could be more salient than, equally salient as, or less salient than the distractor in terms of orientation or color. Critically, performance correct to the target was analyzed as a function of the saccade latency, to see how the relative contributions of stimulus salience and goal-directed control developed as a function of time. When participants were asked to make an eye movement to the target, the results showed that this was not necessarily an easy task. Similar to the results of Theeuwes and colleagues (e.g., Godijn & Theeuwes, 2002; Theeuwes et al., 1998), the eyes often mistakenly ended up at an irrelevant distractor location. However, from using these specific displays with varying levels of stimulus salience, the results showed that the relative salience of the elements only affected search for eye movements with a short saccade latency. Specifically, initial eye movements that were elicited shortly after the display had been presented ended up landing at the most salient element. This, in turn, benefited search when the target happened to be the most salient element, but hurt performance when the irrelevant distractor happened to be the most salient element. Instead, when the first eye movement occurred later in time, no effect of salience was observed. Later in time, eye movements were directed correctly to the target, independent of how stimulus salience was distributed across the visual field. These eye movements thus appeared to be increasingly more goal-driven (see also van Zoest & Donk, 2008). This pattern has been replicated numerous times and is also extremely robust across different populations, such as action video-game players and deaf observers (Heimler, Pavani, Donk, & van Zoest, 2014; Heimler et al., 2015).

Moreover, the results showed that the initial stimulus-driven activity did not contribute to the ability to voluntarily select a location in the visual field later in time (Donk & van Zoest, 2008). In a task in which observers were instructed to saccade to the most salient location in a search display, the results showed that only fast saccades were accurate at targeting the most salient feature. Though stimulus salience was task-relevant, when participants had ample time available and selection could have been completely volitional, observers were unable to use the salience information for guiding eye movements. If information about target salience had persisted in the visual system, correct selection of the salient location should have been possible across the saccadic latency distribution. The fact that performance falters with the passage of time suggests that the representation of salience degrades over time, even when it is task-relevant. This furthermore suggests that the salience representation changes, regardless of whether voluntary processes are online to guide selection (Donk & van Zoest, 2008). These findings suggest that saccades are either voluntary or involuntary. Early saccades are involuntarily driven by salience, but later saccades are voluntary and more in line with the intentions of the observer. Although stimulus salience may overlap with the target—that is, the target one is looking for might be very salient—this does not necessarily benefit voluntary target selection (e.g., Donk & van Zoest, 2008; van Zoest & Donk, 2005).

It is important to point out that when we speak of time and discussed early and late selection, we typically referred to the variability in time in response latency of the first initial eye movement. Some eye movements take little to start (e.g., within 250 ms of display presentation), whereas other movements take more time (e.g., longer than 250 ms). However, selection control variations may also be observed at larger time scales. For example, Over, Hooge, Vlaskamp, and Erkelens (2007) had observers search for a target (i.e., a military vehicle) in natural scene pictures and measured eye movements across a much larger time scale—that is, from 0 to a maximum of 30 s in each trial. The results showed that over the course of a trial, saccade amplitude gradually decreased and fixation duration gradually increased, showing that as a trial proceeds, eye movements become increasingly more fine-grained (Over et al., 2007). The idea is that a first fast analysis of visual information occurs at a coarse spatial scale, whereas later, slower analysis occurs at finer spatial scales (see also van Zoest & Hunt, 2011). In terms of saccade metrics, at the beginning of each search trial fixation duration may be short and saccade amplitude large, whereas later on, metrics may adapt to the apparent difficulty of the stimulus at hand such that fixation duration increases and saccade amplitudes decreases (see also Scinto, Pillalamarri, & Karsh, 1986). Consequently, “coarse” then refers to the eye movement parameter settings that are optimal for salient target detection and “fine” refers to eye movement parameter settings that are optimal for inconspicuous, nonsalient targets detection (Over et al., 2007; see also Hochstein & Ahissar, 2002). Yet, it is important to note that the coarse-to-fine time course observed in long-duration trials is likely related to changes in strategic control settings rather than to a shift from bottom-up to top-down control (Donk & van Zoest, 2008; van Zoest, Hunt, & Kingstone, 2010). Indeed, top-down control settings may vary across even larger time scales, including hours or minutes rather than seconds or milliseconds. For instance, selection control can vary between early as compared to later blocks in an experiment as consequence of training, learning or fatigue (e.g., B. A. Anderson, Laurent, & Yantis, 2011a; Leber & Egeth, 2006; Leber et al., 2009). Again, these results are evidence for changes in top-down settings rather than a change in the relative contribution of bottom-up and top-down to the selection process.

Finding that in particular the timing of the initial (early) eye movement has a large influence on whether selection is ultimately driven by salience or goals suggests that stimulus salience and goal-directed influences act in different time frames and is not consistent with models of visual selection assuming bottom-up and top-down biases to be simultaneously integrated in a common priority map (e.g., Awh et al., 2012; Connor et al., 2004; Wolfe, 1994). Thus, whereas goal-driven processes seem to affect selection only well after the presentation of a visual scene (van Zoest & Donk, 2006), stimulus salience seems to be only transiently represented in the human brain (Donk & van Zoest, 2008). In sum, when stimulus-driven selection happens, it tends to occur before goal-driven driven selection. In fact, we are not aware of any evidence in the literature of automatic stimulus-driven selection occurring late in time following goal-driven processes of selection. This may be a general point of fact: Automatic processes occur before voluntary intentional processes.

Nevertheless, although evidence from eye movements suggests that saccadic control is constrained by time such that automatic processes precede voluntary processes, it is important to emphasize that this does not necessary imply that goal-driven process or other higher-level mechanisms are not available early during processing (Becker, Lewis, & Axtens, 2016; Hollingworth, Matsukura, & Luck, 2013a, 2013b; Silvis & Van der Stigchel, 2014; Weaver, Paoletti, & van Zoest, 2014). For example, if one is looking for a specific target of a salient color that is very different from the surrounding nontargets, top-down control can be available early (Becker et al., 2016; van Zoest & Donk, 2008; Weaver et al., 2014). However, when the target requires more scrutiny and more careful template matching, goal-directed control requires time. Indeed, the evidence suggests that goal-driven control is available earlier when the target and distractor are dissimilar than when they are similar (van Zoest & Donk, 2008). In other words, although timing is critically important in control, the availability of control depends furthermore on the visual context, including the uniqueness of the target and the similarity between targets and distractors.

Eye movement control depends on visual context

Most early work on oculomotor control was conducted using simple displays containing a countable number of simple elements. For example, displays contain one or more geometric shapes, line orientations or Gabor patches (e.g., Findlay, 1997; Hallet & Adams, 1980; Theeuwes et al., 1998; Zelinsky, 1996). Yet, a critical question is whether the principles derived from research with simple displays also apply to more realistic visual situations, like real-world picture viewing. In contrast to studies using simple displays showing a critical role for physical saliency in visual selection, evidence for bottom-up selection in more complex natural scenes has been much rarer. Indeed, the earliest studies using real-world complex displays showed that oculomotor selection behavior is strongly dependent on the task of an observer: A person tends to fixate those locations in a picture that are of interest (Buswell, 1935; Yarbus, 1967; see also Castelhano, Mack, & Henderson, 2009; DeAngelus & Pelz, 2009). This was particularly evident from the finding that presenting the same picture to an observer but changing the instructions leads to different viewing behavior. That is, an observer tends to select different regions of a picture when the instructions change. For instance, Yarbus had an observer view a painting (The Unexpected Visitor by I. E. Repin) under different instruction conditions. When the observer was asked to view the picture to give the ages of the people depicted, the people’s faces were primarily fixated, whereas when the observer was asked to remember the positions of the people and objects in the painting, the fixations were much more distributed across the picture. These results led to the idea that oculomotor selection in complex images is primarily determined by instructions and the goals of an observer.

Notwithstanding these early results, many following studies were strongly focused on outlining the contribution of stimulus-driven factors in the control of eye movements in real-world images. This focus was partly driven by findings obtained with simple displays suggesting a strong bottom-up component in eye movement control. That is, various studies using simple displays had provided evidence for the idea that simple features, such as a green element among many red elements, pop out and thus can be processed preattentively, in parallel across the visual field (e.g., Treisman, 1988; Treisman & Gelade, 1980). In line with these and other findings, Koch and Ullman (1985) proposed that the output of separate feature maps feed into a salience map thus providing a featureless two-dimensional representation of the conspicuity of locations across the visual field. The salience map was proposed to offer an efficient way to control visual selection in a bottom-up manner in the sense that its output can guide attention to the most salient location in the visual field. That is, the map determines visual selection such that when multiple objects compete in the visual field, the object generating the highest activity is selected first. Afterward, the selected location is suppressed by inhibition of return (e.g., Itti & Koch, 2001; Klein, 2000; Posner & Cohen, 1984), resulting in the selection of the next most salient location and so forth. Koch and Ullman’s proposal of the salience map and its subsequent elaboration into a computational model (Itti & Koch, 2000, 2001; Itti, Koch, & Niebur, 1998) generated many studies aimed to test the salience-map model with real-world images.

One illustrative example of a study explicitly aimed to investigate the extent to which salience affects overt visual selection performance was performed by Parkhurst, Law, and Niebur (2002). In this study, participants were presented with four different types of images (home interiors, natural landscapes, city scenes, and computer-generated fractals) and asked to free-view each of these images for 5 s. Eye movements were recorded and a comparison was made between the mean salience values obtained at the fixated positions and the mean salience values as expected by chance. The results showed that the salience at the fixated positions was generally higher than expected on the basis of chance. Moreover, the correlation between salience and fixation was demonstrated to be largest at the beginning of a trial but remained significant throughout a trial. Parkhurst et al. concluded that even though oculomotor selection might eventually be affected by the goals of an observer, eye movements are foremost under the control of salience (see also Foulsham & Underwood, 2008; Masciocchi, Mihalas, Parkhurst, & Niebur, 2009; Parkhurst & Niebur, 2003, 2004; Peters, Iyer, Itti, & Koch, 2005), which is in line with the predictions of a salience-map model.

Even though the observed correlation between salience and overt visual selection behavior has been confirmed in multiple studies (e.g., Baddeley & Tatler, 2006; Foulsham & Underwood, 2008; Masciocchi et al., 2009; Parkhurst & Niebur, 2003, 2004; Peters et al., 2005; Reinagel & Zador, 1999; Tatler, Baddeley, & Gilchrist, 2005), the answer to the question as to what causes fixations to be predominantly directed toward salient locations still remains unsettled. Showing a correlation between salience and visual selection behavior does not necessarily imply that salience determines visual selection behavior. Indeed, salience has been demonstrated to covary with several factors including the locations in a display (e.g., Tatler, 2007) and the presence of objects (Einhäuser, Spain, & Perona, 2008; Nuthmann & Henderson, 2010). For instance, in pictures of natural scenes, the center position tends to contain more salient features than the peripheral locations. Any intrinsic bias of an observer to fixate the center position rather than the periphery (i.e., central fixation bias:; see Tatler, 2007) may therefore result in higher salience values at fixation than predicted on the basis of average. Similarly, objects may on average be more salient than backgrounds. When observers have a top-down bias to select objects rather than the background, salience values at locations that were fixated will on average be higher than expected by chance, irrespective of whether salience drives selection. Indeed, Einhäuser et al. explicitly addressed the question whether the often reported correlation between salience and selection behavior is truly the result of salience-driven selection or merely an accidental finding related to observers being inclined to (top-down) select interesting objects that generally happen to be more salient than the background (see also Carmi & Itti, 2006; Nuthmann & Henderson, 2010). In fact, eye movement behavior might be primarily driven by the interests of observers to top-down select meaningful information or objects rather than by “dumb” salience-driven processes. In the study by Einhäuser et al., observers were asked to view images of natural scenes after which they were asked to name the objects they saw. Their results showed that the recall frequency of objects was a better predictor of fixations than salience as determined on the basis of the salience-map model. Accordingly, the authors suggested that salience might only be a correlate in its relationship to oculomotor selection behavior: the true driving force behind the correlation between salience and oculomotor behavior being the bias of observers to select interesting objects. This conclusion is in line with the results of a study by Nuthmann and Henderson who directly compared the distributions of fixations within real objects and within so-called salience proto-objects, which were generated on the basis of the extension of the salience-map model by Walther and Koch (2006). The results of this study indicated that although viewers strongly preferred to look at or near the center of real objects, this tendency was much less pronounced for salient proto-objects. Accordingly, the authors suggested that eye movements are primarily object-based and thus controlled by the goals of an observer rather than by the salience distribution across a picture. These results are in line with the cognitive relevance hypothesis (Henderson, Brockmole, Castelhano, & Mack, 2007; Henderson, Malcolm, & Schandl, 2009) stating that the cognitive representation of visual input driving visual selection is not ordered by salience, as assumed by the salience-map models, but by cognitive relevance. That is, different potential saccadic targets are ranked in terms of the degree to which they bear any cognitive relevance. Visual selection subsequently evolves in sequence of this ranking rather than in sequence of salience.

Nowadays, a large number of studies have reported that oculomotor selection behavior is affected by a variety of top-down factors such as the presence of specific domain knowledge (Underwood, Foulsham, & Humphrey, 2009), expertise in chess (Reingold, Charness, Pomplun, & Stampe, 2001), object–scene inconsistencies (Loftus & Mackworth, 1978; Underwood & Foulsham, 2006; but see Võ & Henderson, 2009), and global contextual information (Torralba, Oliva, Castelhano, & Henderson, 2006). However, the fact that selection behavior is affected by top-down influences does not exclude the possibility that selection is additionally driven by salience.

Several studies have tried to explicitly pit bottom-up influences against top-down factors so as to directly address the question whether oculomotor selection behavior in real-world images is ultimately driven by salience or by the goals of an observer (e.g., Einhäuser et al., 2008; Henderson et al., 2007; Henderson et al., 2009; Malcolm & Henderson, 2010; ’t Hart, Schmidt, Roth, & Einhäuser, 2013; Underwood, Foulsham, van Loon, Humphreys, & Bloyce, 2006). For instance, Henderson et al. (2009) had participants search for nonsalient targets presented in semantically appropriate locations in a real-world scenes. The results demonstrated that the majority of eye movements were directed toward potential target locations rather than to salient locations in the scenes. This pattern of results was obtained throughout the duration of the trial, even when the first eye movement in the scene was considered. The authors concluded that cognitive relevance rather than salience is crucial in determining visual selection behavior. Similarly, Einhäuser et al. measured eye movements of observers while they viewed pictures of outdoors scenes. Among others, Einhäuser et al. modulated the contrasts in the pictures such that contrast gradients were created leading to a difference in salience between both sides of each picture. In Experiment 1, observers either were to free-view the images or search for a target (a bull’s eye). The target was either equally likely presented at both sides of each picture or was always presented at the low-salience side of each picture. The results demonstrated that during free viewing, observers were strongly biased by the contrast gradient. That is, eye movements tended to be directed toward the high-contrast side of the picture. However, when observers searched for the target, this bias entirely disappeared, and was even reversed when the target was only presented at the low-salience side. The authors concluded that even though salience may determine oculomotor selection behavior when observers are free to explore a scene, task demands such as searching for a target can immediately override salience-driven selection.

It is clear that salience-map models are unable to fully explain the results reported above. Yet what remains puzzling is the fact that oculomotor control in simple displays is strongly affected by salience (Donk & van Zoest, 2008; van Zoest & Donk, 2005). Thus, even though salience may play only a subordinate role in real-world image viewing, its role in simple displays is clearly different. A couple of differences between the two lines of research may explain this discrepancy. First of all, studies that predominantly find evidence for stimulus-driven control tend to have used relatively homogeneous search displays (e.g., Godijn & Theeuwes, 2002; Hunt et al., 2007; Ludwig & Gilchrist, 2002; van Zoest & Donk, 2005). These studies have used displays that contained at least one salience signal (Itti & Koch, 2000). In contrast, studies that have emphasized the importance of goal-driven control tend to have used heterogeneous search displays, in which no single salience signal was present (e.g., Foulsham & Underwood, 2008; Henderson et al., 2007; Nuthmann & Henderson, 2010). Display homogeneity may be an important factor to determine the relative contribution of stimulus-driven and goal-driven control.

Secondly, the timing of response may play a critical role in finding evidence for one or the other type of control (see also Hunt et al., 2007; van Zoest et al., 2010). Evidence for stimulus-driven control is primarily observed when eye movement responses were triggered quickly following display presentation (e.g., Donk & van Zoest, 2008; Hunt et al., 2007). Responses slower than 250 ms or subsequent eye movements are not affected by salience at least not when the stimulus display remains unaltered (Siebold, van Zoest, & Donk, 2011; van Zoest & Donk, 2005). Thus, the effects of stimulus salience seem very much limited in time (Donk & van Zoest, 2008). In studies of gaze control that investigate scan patterns of a couple of seconds involving multiple eye movements, the selection window may have surpassed the critical range in time in which prioritization of processing is susceptible to stimulus salience. The idea is then that experimental paradigms that enable fast responses are much more likely to observe effects of stimulus-driven control relative to paradigms that lead to slow responses. Different studies may tap into and measure different control processes as a function of when in time selection occurs. Indeed, recently, N. C. Anderson, Ort, Kruijne, Meeter, and Donk (2015) presented observers with images of natural scenes, which were modulated in a similar vein as in Einhäuser et al. (2008). That is, the salience distribution across images was varied by decreasing or increasing the contrast in a gradient across each image. Observers were either engaged in a free-viewing task or a visual search task. Dissimilar from previous studies on oculomotor control in real-world images, this study explicitly aimed to examine the role of time in visual selection behavior. Toward this aim, oculomotor selection behavior was investigated in relation to the latency of eye movements (i.e., fixation durations). The results demonstrated that short-latency first saccades were more likely to be directed toward the high-salience side of an image than long-latency and subsequent saccades. This was the case in both the free-viewing task and the visual search task. The authors concluded that salience indeed influences oculomotor behavior in natural scenes. Yet, its effects are limited in time, similar as found with simple displays. This study illustrates that the control of visual selection depends on the timing of a response also in complex displays.

A different situation arises when displays are changed during viewing. A change is often accompanied by a transient signal that in itself is highly salient and demonstrated to be powerful enough to attract the gaze even well beyond a first eye movement (Brockmole & Henderson, 2005, 2008; Franconeri, Hollingworth, & Simons, 2005; Jonides & Yantis, 1988; Matsukura, Brockmole, Boot, & Henderson, 2011; Matsukura, Brockmole, & Henderson, 2009). However, several recent studies suggest that even a change that is not accompanied by a transient signal—that is, when the change occurs during saccadic suppression, may attract the gaze (Siebold & Donk, 2014; Silvis & Donk, 2014). For instance Silvis and Donk presented observers with simple displays consisting of two orientation singletons and multiple background lines. The task of observers was to make an eye movement toward the target singleton that was specified by its orientation. Prior to this eye movement they were instructed to make a vertical eye movement that was not task-related. However, during this vertical eye movement, the luminance of one of the singletons was increased or decreased such that it became the most salient element in the display. The results showed that after the change the most salient singleton again captured the gaze even though the change was not accompanied by a transient signal and the affected eye movement represented the second one after the presentation of the display. These results suggest that salience may well affect oculomotor selection beyond a first eye movement but only in case of a change that renders one of the display items the most salient item. As is evident from a recent study (N. C. Anderson & Donk, 2017) this is true not only for simple displays, but also for natural scenes: A real-world object that is changed during an eye movement to become the most salient item in the display will subsequently attract the gaze.

Attention and eye movements

Given that both visual attention and eye movements are selective with respect to the processing of information from the environment, many researchers have questioned how and to what extent these two measures are functionally related and whether the same principles are underlying selection in both of these measures. The selective coupling between attention and eye movements is evident from a wide range of different studies (e.g., Godijn & Pratt, 2002; Hoffman & Subramaniam, 1995; Irwin & Gordon, 1998; Kowler, Anderson, Dosher, & Blaser, 1995; Shepherd, Findley, & Hockey, 1986; Van der Stigchel & Theeuwes, 2005a); however, whether the link is mandatory and equally binding for both stimulus-driven and goal-directed eye movements remains a debatable issue.

One influential theory regarding the relationship between eye movements and attention is the “premotor theory of attention” (Rizzolatti et al., 1987; Sheliga, Craighero, Riggio, & Rizzolatti, 1997; Sheliga, Riggio, & Rizzolatti, 1994). According to this theory, the mechanisms involved in both the programming of saccades and shifts of spatial attention are basically the same. It is argued that there is only one mechanism for active interaction with the environment, which directs both attention and action toward a goal. According to this viewpoint, visual attention follows motor programming, and attention is a by-product of the action of the oculomotor system, providing an explanation for the increased performance for stimuli located at the location towards which a motor response is prepared. In terms of the premotor theory, a shift of attention without a concomitant eye movement is conceived as a canceled motor program. For example, when a central arrow is instructing participants to covertly direct attention to a particular location, the oculomotor system is assumed to prepare an eye movement toward this location that is subsequently not executed. When covert attention is invalidly cued to a location, an increase in reaction time to the target occurs because of the additional time it takes to cancel the invalid oculomotor program to the cued location and to prepare another one to the uncued target location. The fact that attention can be oriented independently of an eye movement is therefore not problematic for the premotor theory, since it argues that motor preparation is required for attention, not motor execution.

Evidence for the idea that attentional selection is based on activity in the same system as oculomotor selection comes from studies by Sheliga and colleagues, who showed that directing covert attention influences the trajectory of a predetermined eye movement to a location different from the attended location (e.g., Sheliga et al., 1994). In these studies, participants made a vertical saccade to a target below or above the fixation point. The direction of the saccade was determined by a cue, which was presented at one of four locations different from the possible saccade target locations. Participants therefore had to attend this cued location in order to know which saccade had to be executed. Results showed that the trajectory of the subsequent vertical saccade deviated away from the attended cue location, indicating that a shift of spatial attention was accompanied by activation within the oculomotor system (see also Van der Stigchel & Theeuwes, 2005b, 2007).

Most early work in the relationship between attention and eye movements studied the relationship between eye movements and attention in cases of goal-directed selection. That is, attention or eye movements are cued with a central cue or some other instruction that indicates the probable location of a target stimulus (Deubel & Schneider, 1996; Hoffman & Subramaniam, 1995; Kowler et al., 1995). One way to understand the relation between attention and eye movements is by studying the locus of attention during oculomotor preparation. On the basis of the idea that both mechanisms are part of a common integrated system, it is possible to formulate two strong predictions regarding the relation between attention and eye movements: First, during saccadic preparation, attention should be solely directed to the target location of the saccadic eye movement and it should not be possible to attend to any other parts of the visual world during this preparation. Second, it should not be possible to prepare a saccade to any other location than the attended location.

The most heavily used paradigm to investigate these assumptions has been a dual-task paradigm in which an attentional identification task and a saccadic task are combined (e.g., Deubel & Schneider, 1996; Kowler et al., 1995). In this paradigm, the primary task for participants is to execute an eye movement to a peripheral saccade goal as indicated by a symbolic cue (e.g., an arrow presented in the center of the visual field). The secondary task is generally a nonspeeded manual response task toward a probe stimulus (e.g., report the identification of a letter) meant to probe the allocation of attention. This probe stimulus is presented shortly before the execution of a saccade, either at the saccade target location or somewhere else in the visual field. If attentional allocation and oculomotor preparation are tightly connected and the activation of the oculomotor system is accompanied by a shift of attention, attention should precede the eye movement. If this reasoning is correct, identification or detection of the probe stimulus should be facilitated when it is presented at the saccade destination as compared to when it is presented at a different location.

Deubel and Schneider (1996) adopted such a dual-task paradigm and measured the capacity of participants to identify a letter, which was presented for a short interval briefly before the execution of a saccade. When this letter was presented at any location different from the saccade target location, performance was at chance level. The results demonstrated that visual discrimination was best when the target letter and saccade target coincided. Crucially, in a second experiment, they showed that this result was also observed when prior knowledge was provided about the location of the target letter: also when the target letter was always presented at the same location throughout the experiment, it was not possible to direct attention to the location of the target letter when this location was different from the saccade target location, even when these locations were closely aligned. These results strongly argue for a selective coupling between attention and eye movements when eye movements are voluntarily cued to a specific location in space. Additional studies confirmed and extended these findings by revealing that the capacity to shift attention to any other location than the saccade goal decreases as the moment in time until saccade onset decreases (Deubel, 2008) and by showing that once an eye movement is fully programmed and ready to be executed, it is not possible to allocate attention away from the saccade goal (Deubel & Schneider, 2003).

Kowler et al. (1995) similarly showed that accurate saccades require shifts of visual attention to the target. In the experiment, participants were centrally precued to a saccade target at one of eight locations on a circular array, while at the same time they were instructed to report the identity of a postcued perceptual target. When the perceptual target could appear randomly at one of eight locations, accurate perceptual performance was only possible when the perceptual target happened to coincide with the saccade target location. When the location of the perceptual target was fixed, perceptual accuracy was comparable to the random condition, but came at a cost in saccade latency. This showed that when participants were preparing a saccade to one location but were simultaneously trying to identify the perceptual target at a different location, they were much slower to make the saccade. This trade-off between the perceptual and saccadic task was further investigated in another experiment in which, in order to limit variability from idiosyncratic strategies and individual differences, participants were instructed to either prioritize the saccade task, prioritize the perceptual task or find an intermediate balance between these two tasks. Irrespective of whether the saccade target location was fixed or random, a trade-off between perceptual and saccadic task was found. This suggests that even when participants know the upcoming saccadic location, this did not specifically benefit saccade programming during the task and perceptual processing still depended on the priority. Also, it was shown that relative to the intermediate condition, the trade-off between saccadic and perceptual performance was much larger when observers were instructed to prioritize the perceptual task than when they were instructed to prioritize the saccade task. That is, when comparing the intermediate priority condition to the saccade task priority condition, substantial improvement in the perceptual task was achieved with little or no costs to saccades; whereas when comparing the intermediate priority condition to the perceptual task priority condition, improvement in the perceptual task was achieved only with significant saccadic delays.

In the dual-task experiments explained above, the relation between attention and the oculomotor system was objectified for voluntary saccades: participants were instructed to execute a saccade as indicated by a central symbolic cue. Peterson, Kramer, and Irwin (2004) showed that the same coupling between attention and eye movements also holds for involuntary saccades. In their experiments, participants had to execute a voluntary saccade to a designated goal, but an abrupt onset was presented on a subset of trials at a location different from the voluntary saccade goal. As was discussed earlier, participants will frequently execute an involuntary saccade toward this onset (Theeuwes et al., 1998; Theeuwes et al., 1999). In the Peterson experiment, target letters were presented prior to saccade execution at various possible locations, including the voluntary saccade goal and the location of the onset. Results from the discrimination task showed that discrimination performance was high at both the onset location as well as the location of the irrelevant onset. These findings therefore suggest that the selective coupling between attention and eye movements holds for both voluntary and involuntary saccades: attention travels along with whatever saccade is made, whether it is voluntary or involuntary.

However, more recent evidence suggests that the coupling between attention and eye movements may hold only for active oculomotor programs, but are not part of the resolution of these programs toward the execution of the saccade. Specifically, Van der Stigchel and de Vries (2015) used the global effect to investigate the relation between presaccadic shifts of attention and saccade landing position of possible saccade goals. The global effect happens when eye movements, instead of landing on the target or a nearby salient distractor, land in between target and distractor (Van der Stigchel & Nijboer, 2011). Performance in a discrimination task presented shortly before the execution of the saccade was high at the location of the target and distractor, but not at the location between target and distractor (even for the global effect saccades). This suggests that attention is coupled to active oculomotor programs directed to the target and distractor, but not necessarily part of the resolution of these programs actual location of the executed saccade (Van der Stigchel & de Vries, 2015; see also Born, Mottet, & Kerzel, 2014).

These findings are in line with recent uncertainties regarding the idea that covert motor preparation is both necessary and sufficient for spatial attention has been debated (Smith & Schenk, 2012). For instance, Hunt and Kingstone (2003a, 2003b) showed that directing covert attention to a spatial location did not necessarily result in the preparation of an eye movement to that location and the preparation of an eye movement to a particular location did not necessarily result in a shift of attention to this location (see also Belopolsky & Theeuwes, 2012). Along similar lines, Juan, Shorter-Jacobi, and Schall (2004) showed that what is selected by neurons in the frontal eye fields during the allocation of covert spatial attention is different from what is selected during the subsequent preparation of a saccade. Furthermore, a recent study has shown that maintaining covert attention at a location can even be accompanied by a suppression of the oculomotor program (Belopolsky & Theeuwes, 2009). In other words, these studies suggest that attention and eye movements are less tightly coupled than assumed by the premotor theory.

One way to directly probe whether an attentional shift can occur without the concomitant preparation of an eye movement is by studying situations in which the capacity to perform a saccade is restricted. For example, this can be done by presenting a stimulus display not centrally in front of an observer but off to one side, such that the eyes of the observer need to be fully rotated to one of the temporal sides to view the display, resulting in an inability to make an eye movement further toward that temporal side (e.g., Craighero, Nascimben, & Fadiga, 2004). With the eyes constrained in this way, participants are then tested in a standard Posner cuing task, in which attentional shifts toward either the temporal or the nasal hemifield are probed as usual. Given that visual acuity is not hampered by this rotation of the eyes, if differences in attentional allocation are observed between the two hemifields, these differences can then be attributed to the (in)ability to move the eyes to the temporal hemifield. The results of the studies by Craighero and colleagues demonstrated that attentional benefits of the cue were indeed reduced for the restricted hemifield, suggesting a strong dependence of attention on oculomotor processes (Craighero et al., 2004; see also Craighero, Carta, & Fadiga, 2001). However, in a more recent study by Smith and colleagues (2004, 2012) a difference between voluntary and involuntary saccade programming was reported when saccadic programming was constrained in a similar setting. They showed that eye abduction did not produce a deficit in voluntary attention: Cuing effects were similar across the abducted and nonabducted hemifields, showing that the ability to move the eyes does not influence voluntary covert attention. However, in contrast to voluntary attention, an imbalance was revealed for involuntary attention. A nonpredictive peripheral cue did not capture attention when it was presented in the abducted hemifield, whereas an attentional shift was observed when this onset cue was presented in the nasal hemifield. On the basis of these results, Smith and Schenk argued that voluntary attention is independent of the oculomotor system, a conclusion that is opposite to the premotor theory (Smith et al., 2004; Smith et al., 2012; Smith & Schenk, 2012).

Converging evidence against a strong coupling between attention and eye movements has also come from studies on spatial attention in patients with oculomotor deficits. The premotor theory predicts that any deficit in oculomotor control should result in problems in spatial attention. Nonetheless, in a study of a patient with complete paralysis of both eyes (Smith et al., 2004; Smith & Schenk \2012), intact voluntary shifts of covert attentional were still observed; a deficit was observed only for involuntary attention in that peripheral cues no longer captured attention. These results were further supported by findings in patients with Duane’s retraction syndrome, a chronic condition that reduces motility of one of the eyes. Also here, voluntary attention did not depend on saccade preparation, in contrast to involuntary attention (Gabay, Henik, & Gradstein, 2010).

Thus, although evidence suggests a strong relationship between attention and eye movements, it seems that the association between the saccade planning and attention may depend on whether the saccade plan is controlled in a involuntary or a voluntary manner, where the coupling is stronger for the latter than for the former. However, it should be noted that much of the evidence against an obligatory coupling between attention and eye movements consist of “abnormal” situations, like an extreme rotation of the eyes in the orbit or in patients with oculomotor deficits. Even though these situations may be unusual, they do warrant a degree of caution regarding a too strong interpretation of the premotor theory that is primarily based on the results obtained under normal circumstances. For instance, a scan pattern of a particular scene does not necessarily represent a one-to-one correlate of the locus of visual attention and eye movements. In other words, elements that are not explicitly fixated may still be attended and processed to some extent. Thus, whereas attention may not always be constrained by eye movements, it may be the case that eye movements are more certainly constrained by attention.

However, recent evidence from concurrent eye movements and electroencephalogram (EEG) recordings seems to further complicate the relationship between attention and eye movements (Weaver, van Zoest, & Hickey, 2017). In Weaver et al.’s (2017) work, the authors investigated the relationship between distractor suppression in visual search and oculomotor control and specifically looked at the relationship between neural signatures of covert attention and overt measures of oculomotor capture. Weaver et al. (2017) reasoned that if there is a relationship between attention and eye movements, the neural signatures of covert attention should help predict the accuracy and quality of subsequent eye movements. Weaver et al. (2017) specifically investigated the relationship between saccade deviation and a lateralized event-related potential (ERP) component called the distractor positivity (Pd; Hickey et al., 2009), which is elicited in the visual cortex contralateral to ignored stimuli and thought to reflect direct action on distractor representations. In the experiment observers were required to make a simple saccade to a visual target presented on the vertical meridian while a distractor was presented slightly to the left or right of the straight path to the target. Again, from previous research we know that an irrelevant visual stimulus that is presented close to the path between the current fixation and the saccadic target can cause a consistent deviation of saccades (e.g., Doyle & Walker, 2001; Godijn & Theeuwes, 2004; McSorley et al., 2006; Van der Stigchel et al., 2006; Van der Stigchel & Theeuwes, 2005b, 2006). For example, saccadic deviations away from an irrelevant distractor are thought to reflect successful suppression of the distractor location, where the strength of saccade deviations away is thought to reflect the strength of the oculomotor inhibition applied to the distractor location (Godijn & Theeuwes, 2004; Meeter, Van der Stigchel, & Theeuwes, 2010; Trappenberg, Dorris, Munoz, & Klein, 2001). Replicating previous work, the results of Weaver et al. (2017) showed that saccadic deviations away increased as a function of saccade latency. Whereas there was little evidence for saccadic deviation among the short-latency saccades, reliable saccade deviation was found for long-latency saccades. Somewhat surprisingly, however, a Pd component was found for both short- and long-latency saccades, regardless of the overall saccade deviation. In fact, the Pd components were identical regardless of saccade latency, suggesting a dissociation between attention and subsequent eye movement performance. Critically, however, for the short-latency saccades the Pd component mostly occurred after the eye movements were initiated, whereas for the long-latency saccades the Pd component overlapped with the saccadic interval. In turn, these results showed that only when the saccadic latency was long was a significant correlation was found between the amplitude of the Pd and saccade deviation, showing that the larger the Pd the more saccades deviated away from the distractor. The authors suggested that the results show that saccade timing is not contingent on the deployment of attention, and concluded that there is a temporal dependency between attention and eye movements. Attention can impact oculomotor behavior only when the attentional mechanism can act before the saccade is triggered. Thus, whereas attention and eye movements appear to act independently when oculomotor selection is quick, attentional processes are able to more directly influence oculomotor control when saccades are triggered later in time.

These recent results of Weaver et al. (2017) appear directly opposite from the findings of Smith and Schenk (2012), who showed that involuntary attention is more tightly coupled to the oculomotor system than is voluntary attention, which may act independently of the oculomotor system. This discrepancy obviously calls for more research. The relationship between attention and eye movements is complicated. The combined methodology of EEG and eye movements may help resolve the many questions and uncertainties that require further investigation. One benefit of concurrently measuring EEG and eye movements is that this methodology offers a trial-by-trial measure of selection outcome with the possibility to link this directly to underlying neural mechanisms. When observing the neural correlates of attentional deployment, there is much less need to infer the location of spatial attention via manual reaction times. Still, one limitation of this approach is that that the relationship between two independent measures—that is, the neural signatures of covert attention deployment and overt saccadic selection—is based on an observed correlation. Evidence for the causal directionality in the relationship between attention and eye movements is very difficult, if not impossible, to prove on the basis of a cognitive behaviorist approach. In terms of theories and models, this consequently makes the premotor theory very difficult to either confirm or disconfirm. Though the premotor theory is one of the few theories that is very explicit about directionality, placing the motor system as the primary causal agent and attention as a secondary affair, because attention and eye movements tend to co-occur in time, the behavioral findings are not unequivocal (see also Smith & Schenk, 2012). Moreover, the directionality in this relationship may further depend on the situational context—specifically, whether observers’ primary task is a manual or an eye movement task (e.g., Hunt & Kingstone, 2003b).

Conclusions and further thought

One may question to what extent the discussed literature and findings are paradigm-specific and whether the specific the laboratory research generalizes to the everyday environment. Specifically, there are many differences between paradigms used in the laboratory and in real-life situations, and although the assumption is that laboratory research exposes fundamental principles of human behavior that will generalize to the everyday environment, this assumption is not always valid (Kingstone, Smilek, & Eastwood, 2008; Kingstone, Smilek, Ristic, Friesen, & Eastwood, 2003). One important assumption of laboratory research in the field of human cognition is the idea that human cognition is subserved by processes that are invariant and regular across situation. It is now clear, however, that cognitive processes change with situational context (Kingstone et al., 2008). This perspective is in line with the message that we would like the reader to take home: control in visual selection is situational as it depends on the visual context. Although visual selection is traditionally posed as a simple dichotomy between stimulus-driven and goal-driven influences, this view in incomplete (see also Awh et al., 2012). Moreover, these processes are not continuously available to influence selection, but tend to be operational in different time windows (e.g., Donk & van Zoest, 2008; van Zoest & Donk, 2005).

Although real-world-based research is necessary to validate lab-based studies, one crucial point is of course that real-world experiments often suffer from a lack of control. That is, the stimuli are typically given, not manipulated. Lab experiments are artificial by virtue of the necessity to impose control. Even though this jeopardizes external validity—for example, top-down goals may exert a much stronger influence in the real world, in the presence of real meaning—it is unlikely that the visual system as studied changes between the two contexts. It will be critically important to see what changes happen in the transition from lab- to real-world-based science, to further characterize how the situational context affects behavior.

In this review article, we have tried to demonstrate how control is constrained by context and time. Notably, these constraints may further shape how other variables, such prior experience, learning, statistical regularities, and reward history, affect selection (e.g., B. A. Anderson et al., 2011b; Chun & Jiang, 2003; Cosman & Vecera, 2014; Hickey et al., 2009). Moreover, the ways in which attention and eye movements are associated and affect control may depend on whether selection is more automatic and stimulus-driven or more voluntary and goal-directed. Awareness of the constraints of control is necessary to help understand when and how visual selection is truly under the control of the observer.

Author note

This work was supported by the Autonomous Province of Trento, Italy (“Grandi Progetti 2012” of the project “Characterizing and Improving Brain Mechanisms of Attention–ATTEND”). S.V.d.S. was supported by VIDI Grant 452-13-008 from the Netherlands Organization for Scientific Research. The authors declare no competing financial interests.