Vision in Biology
So why vision in biology? What does biology have to do with robots? Well,
biomimetics is the study of biology to aid in the design of new technology -
such as robots.
The purpose of this tutorial is so that you can understand how biology approaches
the vision problem. As we progress through parts
2,
3, and
4 you will start to draw
parallels between how a robot can see the world and how you and I see the world.
I will assume you have a basic understanding of biology, so I will try to build
upon what you already know with a bottom->up approach, and hopefully not bore
you with what you already know.

The Eye
The eye is stage one of the human vision system. Here is a diagram of the human eye:

Light first passes through the iris. The iris is what adjusts for the amount
of light entering the eye - an auto-brightness adjuster. This is so no matter how much
light the eye sees, it tries to adjust the eye to always gather a set amount. Note that if
the light is still too bright, you will feel naturally compelled to cover your eyes with your hands.

Light then passes to the lens, which is stretched and compressed by muscles to focus
the image. This is similar to auto-focus on a digital camera. Notice how the lens
inverts the image upside-down?

With two eyes creates stereo vision, as they do not look in parallel straight lines.
For example, look at your finger, then place your finger on your nose - see how you automatically
become cross eyed? The angle of your eyes to each other generates ranging information
which is then sent to your brain. Note: this however is not the only method the eyes use
to generate range data.

Cones and Rods
The light then goes into contact with special neurons in the eye (cones for color and
rods for brightness) that convert light energy to chemical energy.
This process is complicated, but the end result is neurons that fire in special
patterns that are sent to the brain by way of the optical nerve.
Cones and Rods are the biological versions of pixels. But unlike in a camera
where each pixel is equal, this is not true for the human eye.

What the above chart shows is the number of rods and cones in the eye vs location in the eye.
At the very center of the eye (fovea = 0) you will notice a huge number of cones, and zero rods. Further
out from the center the number of cones sharply decrease, with a gradual increase in rods.
What does this mean? It means only the center of your eye is capable of processing color -
the information from the rods going to your brain is significantly higher!

Note the section labeled optic disk. This is where the optic nerve
attaches to your eye, leaving no space left for light receptors. It is also
called your blind spot.

Compound Eyes
Compound eyes work in the same way the human eye above works. But instead of rods and cones
being the pixels, each individual compound eye acts as a pixel. Unlike popular folk-lore,
the insect doesnt actually see hundreds of images. Instead it is hundreds of pixels, combined.

An robot example of a compound eye would be getting a hundred
photoresistors and combining them into a matrix
to form a single greyscale image.

What advantage does a compound eye have over a human eye? If you poke a human eye out,
his ability to see (total pixels gathered) drops to 50%. If you poke an insect eye out,
it will still have 99% visual capability. It can also simply regrow an eye.

Optic Nerve 'Image Processing'
Most people dont realize how jumbled the information from the human eye really is.
The image is inverted from the lens, rods and cones are not equally distributed,
and neither eye sees the exact same image!

This is where the optic nerve comes into play. By reorganizing neurons physically,
it can reassemble an image to something more useful.

Notice how the criss-crossing reorganizes the information from the eyes
- that which is seen on the left is processed in the right brain,
and that which is seen on the right is processed in the left brain. The problem
of two eyes seeing two different images is partially solved. Also interesting to note, there
are significantly fewer neurons in the optic nerve then there are cones and rods
in the eye. Theory goes that there is summing and averaging going on
of 'pixels' that are in close proximity in the eye.

What happens after this is still unknown to science, but significant progress has been made.

Brain Processing
This is where your brain 'magically' assembles the image into something comprehendable.
Although the details are fuzzy, it has been determined that different parts of your brain
process different parts of the image. One part may process color, another part detecting
motion, yet another determining shape. This should give you clues to how to program such a
system, in that everything can be treated as seperate subsystems/algorithms.

And yet more Brain Processing . . .
All of the basic visual information is gathered, and then processed again into yet a higher level.
This is where
the brain asks, what is it do I really see? Again, science has not entirely solved this
problem (yet), but we have really good theories on what probably happens. Supposedly the brain keeps a large
database of reference information - such as what a mac-n-cheese dinner looks like. The brain
'observes' something, then goes through the reference library to make conclusions on what is observed.

How could this happen? Well, the brain knows the color should be orange, it knows it should
have a shiny texture, and that the shape should be tube-like. Somehow the brain makes this connection,
and tells you 'this is mac-n-cheese, yo.' Your other senses work in a similar manner.

More specifically, the theory is about pattern recognition . . . its sorta like me showing you an ink blot,
then asking you 'what do you see?' Your brain will try and figure it out, despite the fact
it doesnt actually represent anything. Its a subconscious effort.

Your brain also uses its understanding of the physical world (how things connect together in 3D space)
to understand what it sees. Dont believe me? Then tell me how many legs this elephant has.

I highly recommend doing a
google search on optical illusions.
This is when the image processing rules of the brain 'break,' and is often used by scientists
to figure out how we understand what we see.

Stereo Image Processing
What has baffled scientists for the longest time, and only recently solved (in my opinion),
is what allows us to see a 2D image and yet picture it in 3D. Look at a painting of a scene, and you can
immediately determine a fairly accurate measurement and distance away of every object in the picture.
Scientists at CMU
have recently solved how a computer can accomplish this. Basically a computer keeps
a huge index of about a 1000 or so images, each with range data assigned (trained) to it. Then by probability
analysis, it can make connections with future images that need to be processed.

Here are examples of figuring out 3D from 2D.

ALL lines that are parallel in 3D converge in 2D. This is a
picture of a traintrack. Notice how the parellel lines converge to a single point? This is a method
the brain uses to guestimate range data.

The brain uses the relation of objects located on the 2D ground to determine 3D scenes. Here
is a picture of a forest. By looking at where the trees are located on the ground, you can
quickly figure out how far away the trees are located from each other. What tree is closest
to the photographer? Why? How do you program that as an algorithm?

If I removed the ground reference, what then would you rely on to figure out how far
each tree is from each other? The next method would probably be size comparisons. You would assume
trees that are located closer would appear larger.

But this wouldnt work if you had a giant tree far away and a tiny tree close up - as both
would appear the same size! So the brain has yet many more methods, such as comparisons of
details (size of leaves, for example), shading and shadows, etc. The below
image is just a circle, but appears as a sphere because of shading. An algorithm that can process
shading can convert 2D images to 3D.