In scientific thought we adopt the simplest theory which will explain all the facts under consideration and enable us to predict new facts of the same kind. The catch in this criterion lies in the word ‘simplest’. It is really an aesthetic canon such as we find implicit in our criticisms of poetry or painting.

The intuition behind much of my work on visual search is that the visual system has an interest in noticing “unusual” items, where “unusual” can mean “unlikely to have been drawn from the same statistical process as the stuff in the surrounding regions,” or it might also mean the more general, “I wasn't expecting to see that.” Based upon this intuition, I have proposed the Statistical Saliency model for visual search, and shown that this model predicts a wide range of search results.

The Statistical Saliency Model: First represent the search display in an appropriate feature space (this is the part that can be tricky for complex targets and distractors). Then compute the saliency by essentially doing a test for outliers. For a 1-D feature, like perhaps length, or contrast energy, this looks like computing a z-score:

Saliency, Δ = (T - μD) / σD Here, T is the target feature, μD is the local mean of the distractors, and σD is the standard deviation of the distractors.

Figure 1

As shown in Figure 1, this amounts to counting the number of standard deviations between the target feature value and the mean of the distractors. (This mean is local, so neighboring items have more effect than more distant items.) The larger the saliency, the easier the predicted search. Essentially this is like representing the distractors with the best-fit normal distribution (shown in dark blue in Figure 1), and asking how likely the target is to have come from that distribution.

For higher-dimensional features such as velocity (vx, vy), color, etc., the model computes the Mahalanobis distance,

Δ = sqrt[(T - μD)’ ΣD-1 (T - μD)],

between the target, T, and the mean of the distractors. Here ΣD is the covariance matrix of the distractors, and T and μD are now vectors. As shown in Figure 2, this again amounts to counting the number of “standard deviations” between the target and the mean of the distractors, but in the multi-dimensional case this involves counting the number of covariance ellipsoids.

Figure 2

The Statistical Saliency model can also be thought of as formalizing the rule of thumb suggested by Duncan & Humphreys (1989): search is easier when target-distractor similarity decreases, or when distractor-distractor similarity increases.

More recent work in our lab, in conjunction with Zhenlan Jin and Alvin Raj, has implemented this saliency model so that it can operate on arbitrary images as input. This includes extracting motion saliency from video, and work demonstrating that saliency is predictive of where people look in video out the windshield of a car, and of time to detect a pedestrian about to cross the road.

My human vision work in general is motivated by the desire for simple predictive models of visual phenomena, which can then be easily applied to such things as image coding, and design of user interfaces and information visualizations. The Statistical Saliency model fits into this framework.

One difficulty in creating a simple model of visual search is that many early search results demonstrated search asymmetries. If the visual system requires a number of asymmetric mechanisms, this makes things difficult for modelers, since one must uncover all the mechanisms to have a reasonable model of visual search, and models will tend to need a new component for each asymmetric mechanism. If this is the way the visual system is, then we need to accommodate it, of course. But I have suggested that a number of visual search experiments which gave asymmetric results were actually asymmetrically designed, and thus no asymmetric mechanism are necessary to explain the results.