Gestalt perception and its formalization

The term "Gestalt" denotes the
perceived structure of visual input. (The analogous notion in language
processing would be: the surface structure of utterances
and texts.) We focus on static images. Their perception is determined
by: (1) similarity with previously encountered images; (2) interpreting
an image as a representation of a 3D scene, thereby accounting for
perspective, lighting effects, etc.; (3) a preference for simplicity
and regularity.

We now narrow our focus once more. We concentrate on abstract patterns
where recognition and 3D-phenomena play no role. This is the domain
of "Gestalt perception" in the more narrow sense of that
term. (Note: It has been surmised that this exercise is intrinsically
useless, because all perception is completely determined by the
cognitive faculty of recognition. This "radical empiricism" has been tested and refuted. Cf. several papers in Ellis (1938).)

Max Wertheimer's Gestalt Principles

Wertheimer's thesis: Gestalt perception is determined by several interacting principles, such as:

Emanuel Leeuwenberg: Structural Information Theory (SIT)

Domain: Sequences of symbols. (E.g.: "XPAPX", "LHOOQ", "OFHSKHKSHFKSHFKSH"
 where the letters stand for atomic elements that have no perceivable
relation to each other.)

The process of perception translates an input sequence into an expression
of a "coding language". The coding language has the same
primitive symbols as the input language, but it has other operations
besides mere concatenation. These operations are: Repetition, Symmetry,
and Alternation. E.g.:

Reference

The discrete line: "turtle graphics"

Leeuwenberg's coding theory will
not be the ultimate account of any domain of human perception.
There are no perceptual modalities where one may realistically
assume an input consisting of discrete, unrelated symbols. The
visual domain presents an additional problem: it is intrinsically
two-dimensional rather than linear. Leeuwenberg tries to sidestep
the latter problem by focussing on black & white drawings
consisting of straight line segments, and representing such drawings
as discrete symbol sequences, by means of "turtle graphics".

In "Turtle Graphics",
a line consisting of straight segments is described from the perspective
of a turtle who draws it: the length of segment 1, the angle to
turn after segment 1, the length of segment 2, the angle to turn
after segment 2, and so on. Combined with the commands "pen
up" and "pen down", this allows the description
of all possible drawings consisting of straght line segments.

(A turtle graphics description specifies a sequence of points.
Note that such a sequence of points can also be used to define
a quadratic or cubic spline curve.)

Structural Information Theory for discrete line drawings

Using a turtle-graphics notation,
the input-code of a square with edge a is: pi/2
a pi/2 a pi/2 a pi/2 a
The shortest SIT-coding of this pattern is: rep4(pi/2
a)

A square with edge c
with on its corners squares with edge a: rep4(pi/2
c rep4(pi/2 a))

This pattern is also perceived when the square with edge c is not drawn explicitly. To account for such examples, the operations "lift pen" and "lower pen" are added to the
coding language.

The input code of a zigzag line:
b a c a b a c a b a c a b a c
a b a c a b a c a

Its shortest coding in the SIT-language: altright(rep6(b
c), a)

A zigzag line is perceived as "symmetric" ifc equals b. The SIT-code does not account
for this observation, because it ignores the relation between b andc: the angles are treated as atomic symbols.

Jos de Bruin's online SIT-Interpreter
applet allows you to play with SIT-codes for discrete lines. [Note
that this applet uses the SIT-notation defined in Chapter 2 of Dastani
(1998), which is different than the one used in the examples above.
Use the button "Auto" to see random examples of codes in that
notation.]

Mapping arbitrary input-codes onto their shortest
equivalent SIT-notation is an NP-complete problem. (The computational
complexity is caused by the alternation-operation.) There are two
implemented algrithms that try to meet this challenge.
Peter van der Helm's program PISA implements a deterministic
algorithm which often (but not always) yields the optimal code. It
was written in C and runs under Unix. Robert Voorn (VU) implemented
a genetic algorithm which often (but not always) yields the optimal
code. It was written in C++ and runs under Unix. Both programs can
be obtained through Mehdi Dastani (mehdi at cs dot uu dot nl).

Dastani (1998) discusses some fundamental
limitations of Structural Information Theory, and tries to overcome
some of them. The most important limitations are:

The assumption that line drawings can
be "linearized" is not warranted.