Abstract

We reported (Troscianko et al, 1995 Perception 24 Supplement, 18) that a neural network has been developed which is capable of labelling objects in natural scenes by first segmenting a scene, then obtaining a description of each segment in terms of a set of features. A neural net is then trained to label the segments on the basis of the feature set. The question we are now addressing is: how important is each of these features to overall performance, both in human and machine vision? We carried out an experiment in which human subjects were trained in the same labelling task as the neural net. Individual segments of scenes (sometimes corresponding to a whole object, eg a car, and sometimes an incomplete region, eg part of the sky) were presented on a screen, and the subject asked to label the scene as one of eleven possible types of object (sky, vegetation, vehicle ...). Feedback was given and the learning curve monitored. When the learning curve was flat, each subject's performance was investigated with both intact and degraded stimuli. The degradation consisted of partial representation of the information, such as presenting just the outer boundary, or the average colour, or the average luminance, or randomising the size, position, and texture of the segment. The results suggest that this degradation produces significant changes in performance (F9,7=4.4, p=0.0005). A posteriori analysis indicates that certain attributes (particularly texture, boundary-only, colour-averaging) are particularly influential in mediating performance. A similar set of results was obtained by training the network on similarly degraded data. The results imply: (1) that a neural net can provide a useful model of human object labelling processes, and (2) that certain features are more important than others in mediating such performance.