Learning to Segment Three-Dimensional Moving Objects
Richard Zemel, CMU
Interpreting scenes containing several independently moving objects
and observer motion is a difficult computational problem. The flow
fields that arise from these complicated scenes are {\em compound}, in
that they have multiple separate causes. An operation that
facilitates scene interpretation is to parse the compound flow fields
by segmenting, or grouping the flow elements arising from a single
object. I will describe a model based on the hypothesis that
sub-patterns in these compound flow fields correspond to object
components undergoing coherent motion, and that these sub-patterns are
statistical regularities which can be extracted from a set of compound
flow fields. While standard unsupervised learning procedures fail to
find this underlying structure in noisy and complex flow fields, I
will present a new unsupervised learning technique derived from a
general information-theoretic learning framework that succeeds in
discovering this structure. The model is trained on flow fields
derived from sequences of ray-traced images that simulate realistic
motion situations, combining observer motion, eye movements, and
independent 3-D object motion; after training, the representations in
the model effectively segment novel compound flow fields into the
component objects. The response properties of units in the network
also resemble response properties of neurons in an area of visual
cortex that is known to be involved in motion processing, which
suggests that these regularities may play a role in biological visual
systems.