The summer vision project is an attempt to use our summer workers effectivelyin the construction of a significant part of a visual system. The particular task waschosen partly because it can be segmented into sub-problems which will allowindividuals to work independently and yet participate in the construction of asystem complex enough to be a real landmark in the development of “patternrecognition.”

Papert’s Summer Vision Project (1966)

The difficulty of computational vision

could not be overstated:

On 5/3/2011 11:24 PM, Stephen Grossberg wrote:

The following articles are now available athttp://cns.bu.edu/~steve:

On the road to invariant recognition: How cortical area V2 transforms absoluteinto relative disparity during 3D vision

Grossberg, S., Srinivasan, K., and Yazdanbakhsh, A.

On the road to invariant recognition: Explaining tradeoff and morph properties ofcells in inferotemporal cortex using multiple-scale task-sensitive

3.Conditional probabilities on daughternodes, given the state of parent node

4.Bayes theorem for inference

5.EM algorithm (Expectation Maximization)for learning the parameters of the model

Example of a generative model

from the work of Stu Geman’s group…

Test set: 385 images, mostly from Logan Airport

Courtesy of Visics Corporation

characters, plate sides

generic letter, generic number, L-junctions ofsides

license plates

Architecture

parts of characters, parts of plate sides

plate boundaries, strings (2 letters, 3 digits, 3letters, 4 digits)

license numbers (3 digits + 3 letters, 4digits + 2 letters)

Original Images

Instantiated Sub-trees

Image interpretation

•

385 images

•

Six plates read with mistakes (>98%)

•

Approx. 99.5% characters read correctly

•

Zero false positives

Performance

Test image

Top objects

Number of visits to each pixel. Left: linear scale Right: log scale

Efficient computation: depth-first search

Computation and learning are much harder in generative modelsthan in discriminative models.

In a tree (or “forest”) architecture, dynamic programmingalgorithms can be used.

The general learning (“parameter estimation”) method:

1.Use your model

2.Update your model parameters

3.Iterate

Expectation-Maximization (EM)

(see book for connection to Hebbian plasticity

and wake-sleep algorithm)

EM algorithm for learning a mixture of Gaussians:

Chapter 10 from

Dayan and Abbott

caution:

observables are “inputs”

causes are “outputs”

Elementary, non-probabilistic, version: k-meansclustering

TheMarkov dilemma:

On the one hand, the Markov property of Bayesian nets and ofprobabilistic context-free grammars provides an appealingframework for computation and learning. On the other hand,the expressive power of Markovian models is limited to thecontext-free class, whereas, as illustrated in the articialCAPTCHA tasks but as is also abundantly clear from everydayexamples of scene interpretation or language parsing, thecomputations performed by our brains are unmistakablycontext-

and content-dependent.

Incorporating, in a principled way, context dependency and verticalcomputing into current vision models is thus, we believe, one ofthe main challenges facing any attempt to reduce the “ROC gap”between CV and NV.