Maria Petrou (1953-2012)
(Added 30 Jan 2015)

This document is closely related to a subset of Maria Petrou's ideas presented
in her cartoons and discussions of an ironing robot, linked below.

I first met and talked with her at the BMVA conference in Manchester in 2001,
and thereafter met her intermittently at workshops and conferences. We did not
ever work together, but I found our discussions and some of her online papers,
including the semi-serious presentation of her robot ironing tutorial and her
discussion of her great aunt's ideas on ironing, very stimulating, and closely
related to my own investigations regarding the need to get machines to
understand 'kinds of stuff', illustrated below, and in this presentation:
http://www.cs.bham.ac.uk/research/projects/cogaff/talks/#babystuff

In 2011, during the proposal phase, Maria invited me to be a member of the
advisory board of the CloPeMa project (see below) of which she was the leader,
and I accepted, but I assume I was found unacceptable by the EU. The last
exchange we had was in June 2011, when she thanked me for accepting, and for
informing her of the Laundry project, below. She must have become ill soon
after, as I heard no more from her before she died.

I was recently delighted to learn that the EU robot project that she had
inspired and coordinated had made impressive progress, as reported on the
project web site:

Some of the questions arising from the "Laundry" project (discussed below) are
very relevant to the CloPeMa project.

Background

There are many projects aiming to give machines competences involving perceiving and
acting on objects in the environment, or exploration of an environment to develop some
sort of map of the terrain, or part of a building.

Insofar as these projects aim to contribute to our scientific understanding, as opposed
to being wholly justified by their practical usefulness (like the note-counting
machines in automatic cash dispensers, and many robots tailored to accurate and reliable
performance of a very specific task on a factory production line), there is a requirement
for the designs of the machines to have some well-defined kind of generality, so that
the researchers can explain in a principled way what the machines can and cannot do and
why, and preferably also show how this achievement contributes towards broader and deeper
longer term goals.

There are different ways of characterising the required generality. A common way is to
collect a large and varied collection of test cases from some corpus, e.g. pictures or
sentences on the internet, or a collection of behaviours generated by a sizeable sample of
naive subjects in a laboratory experiment.

I have always found those ways of specifying the scope of a theory unsatisfactory: there
should be a more principled way than merely collecting examples. It should be possible to
explain what those examples have in common and why it is of interest to find a general way
of handling them, and why other things should not be included in the scope of the theory
or model, nor used for testing it. For example, it is fairly easy to give good reasons for
not expecting Newton's theory of gravitational attraction and his mechanics to provide a
satisfactory explanation of the pattern of motion of a leaf falling from a tree (though
this may not have been easy before Newton's time: a good theory may teach us to
characterise its domain of applicability).

I feel that a very high proportion of research being done in AI and Robotics fails to
meet this criterion -- even if the research is interesting and potentially valuable
for other reasons (including being a step on the way to producing a theory or model
that does meet the criterion).

That leaves the problem of deciding how to select collections of cases that have the right
sort of generality. I have many examples in things I have been writing about child or
animal development, or challenges for AI -- e.g. proposing the polyflap domain
as a potentially useful robotic challenge:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/polyflaps

That domain is generative in the sense that there is a (fairly) precisely specified way
of producing more and more complex and varied examples that could be used as test cases.

In this document I'll attempt to characterise a domain that is generative in the
sense that its examples can be decomposed into features that can be combined in
systematically varied ways. I have not yet tried to produce a precise formal
specification of that generality, but I hope the examples will suffice for now,
making use of the powerful human ability to observe the structure common to a
collection of cases, currently lacking in computers. Later we need to characterise
the domain, and criteria for success, or at least progress, more precisely.

So far, my characterisation of the domain, below, is far from complete. I'll
investigate the possibility of addressing that later. For now I want to indicate
how a domain of processes can be generated by systematically varying the
geometric configurations, materials used, and operations or forces applied to
different parts of objects.

One kind of generality that is missing from the examples below is the recursive use of
abilities to rearrange physical matter in order to achieve a new state in which
possibilities and constraints are altered so as to allow (or help, or prevent) certain
additional rearrangements. See this discussion of varieties of deliberation for more on
the requirements for such competences:
http://www.cs.bham.ac.uk/research/projects/cogaff/misc/fully-deliberative.html

How to Develop Scenarios for a Grasping Robot (and others)

A general principle for designing scenarios so as to avoid dead ends is that every
particular kind of process in the scenario is a special case of a well defined class
of processes. Finding out what that class should be is a non-trivial research problem.
(It is probably connected with what goes on during infant and toddler learning:
discovering good ways to generalise beyond examples already learnt -- by developing
a generative theory, where possible.)

Some early work in vision attempted to meet this criterion by considering types of image
that could be generated by a grammar (e.g. a web grammar) and then specifying an algorithm
or collection of algorithms able to cope with all instances of the grammar, e.g. by
producing a 3-D interpretation.

Instead of grammars some researchers systematically studied classes of picture element and
ways of combining them to form larger pictures, and deriving general modes of
interpretation of such pictures (e.g. the Huffman-Clowes line-labelling algorithm for
interpreting 2-D pictures of tri-hedral polyhedra, later expanded by Waltz to include a
wider range of scenes and pictures. More recent work aimed at extending that generality is

So even if a practical project has narrowly specified goals, if it is to contribute
to scientific understanding it should have the sort of generality described here,
even if not all of the generality is required for practical goals. Not all practical
projects need have scientific goals. Many don't.

However, if a project is to produce results that are robust and extendable, then it
is important for the tests and designs chosen in the scenarios to include cases that
are not required for the specific practical goals. For example, some situations can
arise that are undesirable, but the fact that they are not desired does not mean that
they should not be understood and dealt with if they arise. This is a way to avoid
premature over-specialisation, which can easily hold up a field like AI (viewed as
science rather than engineering), including robotics.

This principle can be applied to:

kinds of material,
kinds of relationship,
kinds of causal influence,
kinds of shape,
kinds of action,
kinds of learning,
kinds of reasoning,
etc.,

addressed in the project. I have previously referred to this as the need for models not
just to scale up (e.g. cope with larger data-sets) but also to scale-out (i.e.
cope with more varied types of challenge, and in combination with different parts of a
whole architecture, when required).

[NOTE:
I think this requirement to "scale out" is related to what John McCarthy called
"Elaboration tolerance", though he presented that as a criterion for adequacy of a
formalism rather than a mechanism. I recently found that some computing researchers use
the same labels for a different distinction also sometimes contrasting "scaling
vertically" with "scaling horizontally". I suspect there is some loose connection with the
contrast I am making.]

Intelligent robots need not only to do things, but also to know what they are doing.
Any type of action or process or state of affairs that an agent needs to be able to
produce should also be something the agent can perceive, think about, reason about, etc.,
even when the process or state of affairs is not part of or a product of one of its own
current or recent actions. The sort of "offline" reasoning that is
applied to actions of others, or to observed physical processes, can also be applied
prospectively or retrospectively to one's own actions (e.g. why did Y happen when I did
X?).

I think new-born human infants lack that kind of intelligence. Offline intelligence
seems to develop through extensions to the architecture and to the forms of representation
and types of mechanism required. The ability is never fully developed even in adult
humans: they can go on learning indefinitely as they acquire new domains of expertise.

An open question is whether such offline intelligence exists in non-human animals: the
ability of individuals to deal successfully with novel problems, or to produce novel
solutions to old problems, without engaging in trial and error, may be evidence (Betty the
hook-making New Caledonian crow studied in Oxford in 2001--2004 seems to be a clear
example). Note that the question whether animals can use offline intelligence in using
matter to manipulate matter is deeper and more precise than asking whether they can use
tools, or make tools.

The ontology needed for perception, planning, reasoning, action-control
Actions involving manipulation include not only processes involving changing spatial
relationships within and between objects, but also causal interactions of various
kinds. Causation is not perceived in the same way as shape, position, velocity,
shape-change, colour, etc. (Humans, some animals, and future intelligent robots need
both Humean (associative) and Kantian (structure-based) conceptions of causation, as
discussed here (with Jackie Chappell):
http://www.cs.bham.ac.uk/research/projects/cogaff/talks/wonac

So projects aimed at producing robots with (adult) human-like intelligence will have
to specify what it is for a robot to understand and be able to reason about,
different sorts of causation. (That's very hard. Even good philosophers find it very
difficult.)

That's not an exhaustive list, merely illustrative.

Here are some example test cases for a robot that is to be able to manipulate
non-rigid materials. Each case can be varied either by changing the material, or by
changing the initial situation, or by changing the final state or by varying the
process of going from initial to final state.

For each action type that the robot can perform it should also be able to
perceive that action, done by itself, done by others, perceived from different
viewpoints. Examples follow:

Agent sees a square of some material on a table with a small portion sticking out
over the edge -- so that it can be grasped and moved by the robot, or someone else.

Variations: the material can be cloth (handkerchief), towelling, tissue paper,
cardboard, writing paper, clingfilm or other plastic, tinfoil, a slice of bread,
pastry, dough, flattened plasticine, ... (Some of these may be very difficult, and
best postponed. At what ages can young children deal with them?)

Variations: the shape can be rectangular, with different ratios of long and short
side, it can be triangular, or some other polygonal shape, or a curved shape.

Variations: the orientation of the shape with respect to the edge of the table can
vary (so that for the same shape the bit sticking out can have different appearances
and grasping requirements, and the same action after grasping can have different
consequences).

Variations:
the motion after grasping (with a firm grasp that allows no slippage between the
fingers) can be horizontal and unidirectional for a short distance. The motion
can continue indefinitely. The motion of the grasped edge can oscillate at
various speeds.

The motion can be vertical (lifting the grasped edge), varying amounts, at
varying speeds, with the orientation of the grasped bit either kept horizontal or
varied e.g. so as to avoid a sharp bend beyond the grasp area. It can be
unidirectional (just lifting) or lifting and lowering.

The motion can be pulling: either pulling horizontally away from the edge of the
table or pulling downwards below the edge of the table, and various directions of
pull in between.

The motion can be pushing: pushing the grasped edge along the surface of the
table orthogonally to the edge of the table, and further varied by pushing in
different directions.

The motion can be folding: lifting the grasped edge and moving it over the table
then down onto another part of the object. Variations include trajectory height,
the orientation of the plane of the trajectory relative to the edge of the table,
where the trajectory ends, and how the orientation of grasp varies during the
motion.

The folding motion may be followed by pressing down on parts of the material
along the fold and in other places.

Other variations can involve holding down portions of the material while the
grasped portion is moved.

(It would be good to have photographs or videos illustrating all the above
variations.)

After a learning process many different tests are possible, with different materials,
different shapes, different kinds of motion.

Can the agent (at least roughly) predict what changes will occur if a pair of fingers
(one above the other) grasps the overlapped portion and lifts it straight up until
there's no more contact with the table, without altering the orientation of the
grasping point?

Can the robot predict what will happen if instead of moving up, the fingers move
horizontally, parallel to the edge of the table for a metre or more? What sorts of
obstacles could obstruct, or modify the motion?

Can the robot predict what will happen if the fingers gripping the corner rotate
until that corner is pointing upwards, and then they move to where the opposite
corner is?
Two cases:
(a) horizontal motion
(b) motion in an arc, going up then down.

Added 4 Feb 2015: Online and offline intelligence
One of the important requirements is the ability not merely to act and produce
desired changes (online intelligence) but also to think about and reason about
what is or is not possible, and why.

What forms should the predictions take: I cannot predict precise changes, but I
can talk about how relationships will change during the predicted motion. I can
make the predictions at various levels of abstraction, with different kinds of
certainty. E.g. if the object moved is made of cloth and the grasped edge is
lifted a distance that is more than the maximum diameter of the cloth then the
cloth will eventually no longer be in contact with the table. I don't need to
know what the maximum diameter is for that prediction to hold.

I can point to a height that I know will be sufficient to raise the cloth so that it
is no longer in contact.

I can make predictions about how the shape will change during the motions, using notions
like folding, angle, curvature, increasing or decreasing curvature, flattening, etc.
without being able to specify numerical values for those processes or their results.

Some of the changes involve topological relations (e.g. loss of contact) and in that sense
are described precisely. Some of the changes can be given bounds that are definite, though
not precise upper or lower bounds. E.g. I know that during vertical movement of the corner
of the cloth the cloth will lose contact with the table before the grasping point has
reached this height (indicated by pointing) even though I don't know the exact
height at which it will lose contact. I can also say that there will still be contact when
then grasped point has reached this height (pointing at a lower height).

[Added 1 Jan 2013]
There's an entertaining, but deep, video by Vi Hart illustrating some of the facts about
folding and production of angles that could first be discovered empirically (using online
intelligence) then later understood mathematically (using offline intelligence):
http://vihart.com/blog/angle-a-trons/

These requirements merely scratch the surface of what is required in the
specification for a human-like robot.

There are lots of deep and difficult implications regarding

the ontologies required
the forms of representation
the forms of reasoning
the implementation mechanisms
the architectural decomposition of functions
the processes of development
the processes of learning
-- empirically, by finding out what happens when
-- non-empirically by reasoning about what must be the case,
which is presumably what first led to the development of Euclidean
geometry

Simulations of type (a) can use variants of "game-engine" technology.
They can be very useful for on-line control of actions using
feed-forward mechanisms, e.g. to predict required adjustments to the
current trajectory, etc.

Simulations of type (b) have quite different functionality and can be
used in answering questions about what would happen if, what might have
caused something to happen, what options would be available if some
action were performed, etc. Type (b) simulations require something very
different from the precise modelling done in game-engines. For example,
the sort of reasoning you do when working out how to get an arm-chair
through a door that's too narrow for it to be pushed through upright,
involves representing types of sub-process and types of
intermediate
situations, rather than the precise details required for controlling
motion when the action is actually being performed. Some examples of
perception of possibilities for processes to occur, and perception of
constraints on such processes (the roots of ancient mathematical discoveries?),
e.g. discussed in:

You can work out combinations of types of translations and rotations
without having the kind of representational precision required to generate a
video of the process.

This document is about the types of representation of structure and
process required for competences of type (b). But it is only a small
start. (I have made other starts in related directions in the other
documents referred to.)