Monday, November 26, 2012

So, I am not interested in robotics
simply to create neat mechanical platforms that do interesting things
(although, that's a lot of fun). I have other motives for doing this.
One of the primary questions I want to answer is this: can I, as a
layman, create a robotics platform, and then write code for it, that
in any way models cognition, or in some way models living processes
that interact with their environment in intelligent ways?

It's a big question. A really big
question.

If you take away the “as a layman”
phrase, some very bright people have been tackling this very problem.
It starts small, with such things as the Braitenberg's vehicles, a gedanken on
minimalistic open-loop control systems and artificial intelligence,
and ends up with things like Asimo, the DARPA challenges, and things
such as what Boston Dynamics is doing.

In a way, mankind has been on this
search for a very long time, as evidenced by such things as the golem
legends.

My latest attempt structures around
emergence. Here's the basic idea:

1. Let's define a vehicle which has a
limited number of moves it can do. Such as, it can move forward,
back, and turn in any direction as long as the turn it makes is some
multiple of 45. That means 0 degrees (no turn), 45, 90, 135, and 180, in
either direction. We'll also let it turn 22.5 degrees (half of 45),
so it can make small corrections.

2. For this vehicle, let's restrict the
inputs. The robot will likely have some pretty varying input
available on its sensors, but we'll do a bit of sensor fusion to calm
down what we see and put it into buckets. For instance, let's say we
have a sonar or IR sensor which has a range of 3 to 100 inches, but
is sensitive in the millimeter range. Instead of returning things
like 78.4589998 inches, let's put it into buckets such as “very
near”, “near”, “middle range”, “far”. We'll do sensor
fusion and return those strings instead of the raw numbers. Same for
all the sensors. A compass sensor may, instead of returning the raw
theta value for its pose, be distilled down into compass rose points.
“45”, “NE”, or “Northeast” would all be fine here, as
long as it's in buckets and not things like “47.35”. This will
allow us to reason in more general terms about the robot's current
state. We could expand this to any number of sensors and sensor types
– and is similar to how the human brain generalizes things in, say,
the visual cortex, where things proceed from more chaotic and raw
values (retina, optic nerve, V1), to more abstract, more general
values (dorsal and ventral streams on through to the prefrontal
cortex/executive function areas). This would roughly model things
that happen before V1, and we're not doing anything even remotely
close to the PFC here. We're also being pretty simplistic – the
process does not keep state of any sort, does not deal with such
problems as visual flow, does not mimic working memory, etc. It just
throws sensor data into buckets.

3. From here, let's create a mechanism
that can define rules. Rules will consist of a set of triggers (the
buckets from point #2), and a set of actions (the actions from point
#1). The general idea is this: we will continually get sensor data
from our environment, sorted into the buckets described above. When
the distilled sensor data matches the triggers for the rule, a set of
actions will be executed. In reality, getting a complete match on the
sensor values we're looking for is probably not going to happen very
often, so we'll define a minimum threshold – if, say, 75% of the
triggers match the current sensor data, that's good enough -- run the
actions. The actions will be executed sequentially. The list of
actions can be arbitrarily long – but, I have found by
experimenting and tweaking, for the program I wrote, two actions are
ideal. The first consists of a turn (or no action), and the second
consists of a movement (or no action). For instance, an action list
may be something like {“turn left 45 degrees”, “go forward”
}. The actions defined are simplistic (turns, forward or backward, or
“don't do anything”), but could easily be more complex actions
(“use the A* algorithm to find a path out of the pickle you've
gotten yourself into”)

4. The next thing we'll need to do is
to determine whether executing a rule actually worked, and for that
we'll need a goal. After executing the rule, we will determine how
close we are to the goal state, and we will also determine whether we
are closer or farther away from the goal state than before we
executed the rule. To do this we will measure the current state (for
some arbitrary goal), execute the actions, and then, after we have
executed the actions, again measure the robot's perceived state
against the goal state we define, and see if things got better, got
worse, or stayed the same. This will determine what's called the
fitment of our rule. In the
long run, rules with a better fitment will survive (and spawn new
rules), while rules that have a poor fitment will be discarded. For
my experiment, I decided to define the goal state as having all
distance sensors be registering equal distance, but having the one
pointing directly forward reporting “far”. This has the effect
that, if you are in a circle looking out (all sensors are reporting
the same distance except the front one, which doesn't see anything),
you are in the goal state. As it turns out, the goal state is
impossible to reach – which is good! That tension will make the
robot come up with interesting results, and hopefully ones we have
not anticipated. If so, if it does that, then this would be an
example of emergence. (very cool book on related subjects: The Computational Beauty of Nature by Gary William Flake)

5.
Next, we'll need a way to create new rules. There's several ways we
can go about this, and it brings up the question of whether our robot
has a priori knowledge of its environment, or not. Some animals do –
for instance, newly born calves know how to stand within minutes of
being born, insects automatically know how to fly, and so forth. On
the other hand, you were not born with the ability to speak, or walk,
or many other things – but you figured it out. The question here
is, do we build in some base rules and watch them grow and/or be
discarded, or do we start with tabla rasa, and then allow things to
build? The first is faster but doesn't tell us as much, the second
has the ability to be far slower and either tell us nothing, or
produce very interesting results. In the end, I decided to go for the
second approach – that being starting with no rules and to let
things grow by themselves.

But
how do we create new rules?

At
first, I thought, let's assume infinite processing power and storage.
Given that, and removing it as a concern, I can generate all
possibilities of the input states, since they are finite (if we
constrain the length of the list of triggers), and the actions we can
do are also finite – after all, we are dealing with eight compass
rose points we can turn to in the first slot,
and then two actions, forward and backward, in the second slot. While
large, the list is finite,
so, let's flip things on their head a little, and start with the set
of all possible rules. This approach would be similar to the approach
given for linguistics by Noam Chomsky in Syntactic Structures, where he starts with
the set of all possible words in a language, and then defines
recursive rule sets which can generate all possible sentences in a
language. His notation is also similar in form to Backus-Naur Form
(BNF), although BNF generallydescribe finite state
machines (but not always) and Chomsky's generative grammars produce infinite sets. What we are
doing here is closer to a finite state machine, in that the set is
indeed finite, since the length of the lists (triggers and actions)
are constrained, and since the rules are not recursive (there are no
subrules within rules), and the rules are essentially a directed
acyclic graph (DAG) restricted to a depth of one. A
good idea, but it produced a problem. Let's say I allow for just five
triggers (a good idea, it turns out, since it models how the human
brain focuses its attention on a limited amount of stimuli at a
time), and we'll keep with just the two actions, turn and then go
forward and back. We'll also say the triggers must be unique – that
is, if you say you're looking for “left front sensor = very near”,
there's no sense in listing it twice, so, each trigger slot holds a
unique value. If we have just five sensors (for instance, front,
back, left and right sonar, and a compass), and each can hold about
eight values (compass rose points for the compass, variants of near,
far, wherever you are for the sonar), that means that with our five
slots, we have the following:

possible
trigger values == 8 * 7 * 6 * 5 * 4 == 6720

possible
actions == 8 *2 =16

all
possible permutations == 107,520

Ok,
not bad. A computer can deal with that many rules. Still, if we say
that each rule takes one second to test, 107,520 seconds is about 29
hours. I can deal with that, but, let's say we add one more sensor...

possible
trigger values == 8 * 7 * 6 * 5 * 4 * 3
== 20160

possible
actions == 8 *2 =16

all
possible permutations == 322,560

Hours
to test all permutations == 89.6

Simulated skid steer robot,
showing simulated sonar sensors (the green lines),
the compass (small magenta line and circle), and
the path it has traveled (the curved line)

But,
this isn't realistic. If we go past where we are now, adding a couple
of sensors, or adding a new type of movement, the number of
permutations skyrockets, and before long, you're dealing with
centuries worth of processor time. I don't have access to a
supercomputer (well, I do, actually, but that's a different story :-) ), so, I am actually limited in the amount of horsepower I can throw
at it. Our original premise becomes unworkable, even thought it's a
great thought experiment. Also, I have my doubts that this is the way
nature works. Instead of throwing all possible permutations at a
problem, I have a hunch that nature takes shortcuts, and figures out
the permutations that have a chance of working. In fact, ACO and PCO
are good examples of just that.

So,
instead of trying all possible permutations, what I did is take some
tried and true methods, but with a twist.

To
generate the first rules, again, without a priori knowledge, I used a
Monte Carlo method for the actions, and empirical sensor readings for
the triggers. For instance, given a set of previously unencountered
sensor readings, I would randomly generate a response, first randomly
generating a turn (or deciding to do nothing), and then randomly
deciding on a movement (forward or back). From there, we measure the
fitment and cull bad rules. From the survivors, I used a combination
of genetic algorithms, mutation of rules (such as, randomly delete a
trigger entry, add a new one, randomly change a resulting action),
and hill climb. These are all fairly common ways to solve this
problem, but, it occurred to me, this is how nature evolves
organisms, but, it's likely not how they learn – or, at the least,
it's not the entire picture. Evolution is glacial, and learning is
not. So, that spawned a couple of things. Firstly, this layman needs
to learn more about how organisms learn, and secondly, how can I
alter the program so that learning is not so bumbling and glacial,
which is what I got at first by using GA's and hill climbing? My
response to the second was to do what I called “hill climb in
place” – I made each of the possible actions have an “undo”.
For instance, turning left 45 degrees is undone by turning right 45
degrees. Forward is undone by backing up. And so on. Once that is in
place, it allows the robot to do trial and error. Given a completely
randomly generated set of actions, it can gradually move from the
action it has, measure how well it is in relation to the goal, and in
relation to its initial state, undo, and try again – gradually hill
climbing, in place,
to reach local optima, if any. Trial and error! Organisms do that,
and it's not excruciatingly slow, and it works! One other tweak
needed to be in place, though – actions need a cost. For instance,
turning 180 degrees should cost more than turning 22. Once that is in
place, it makes sense to try things out in ascending order according
to cost, until you get to a state where your measurement against the
desired goal went down, instead of up. From there, simply undo what
you did last, and there you are – the best thing you could have
done for the state you are in is the thing you did right before the
thing you did that screwed everything up :-)

So,
that's the outline of the program. A limited set of triggers and
actions, which we use to define rules, and then we use Monte Carlo, genetic
algorithms, random mutation, hill climbing, and a tweaked version of hill climbing
which allows us to do trial and error, in place, and figure out the
best action to take.

One
other note: groups of neurons fatigue after a while. After firing
repeatedly for a while, their signal strength degrades. This serves
multiple purposes in the brain, and I imitate it here. In the
program, it means that, even if a rule is a good fit for a situation,
if it fires too many times in a row, it has to sit down and be quiet
for a while, and allow other rules to have a chance.

So
does it work?

Yep.
Sure does.

I
created a simulation of a robot in a maze, and what
I have observed is this:

Early
on, I see completely stochastic movement, and a lot of bad decisions,
such
as running into walls, turning in a direction that is bad, and so on.
If
I let it run for a while, say an hour or so, I start to see good
decisions. I also see elegant motion. For instance, instead of
jerkily moving away from a wall, I will instead see the robot moving
in arcs and curves, smoothly going around obstacles or following a
wall. Keep in mind, I never once told it how to follow a wall or how
to move in an arc. This is all emergent behavior, derived from
simpler building blocks. The robot came up with these solutions to
its problems by itself, and those solutions mimic nature in form by
the process of emergence. Since the rules fatigue, I have also seen
the simulation fall into patterns of rules, or repeating sets of
rules. For instance, instead of sitting in one spot and running a
rule over and over (in which case, it will quickly fatigue), the
robot would instead move in patterns that look like a five pointed
star, or a small circle, where it executes one rule, and then
another, and so on, eventually coming back to a spot where the first
one again can fire. In this way, it takes a much, much longer for a
rule to fatigue, since it has a rest of five turns in between when it
is triggered. Again, this behavior is emergent – there is nothing
in the programming which allows for patterns of rules, and, in fact,
the program is generally stateless. Yet repeating patterns and
interlocking sets of rules arise.

Robot simulation executing organic-like, curved paths in search of its goal state

So
this is learning, in miniature, in a way. Both for me and for the
robot. It points out how much can be made from small, simple pieces,
given a mechanism to allow for emergent behavior. It shows the tendency of life to move away from entropy and towards structure.