Eye to Eye
Cameras and Vision

Human vision and photographic imaging have some significant similarities.
The eye and the camera each consist of a lens, an aperture, an image plane, and light sensors.
A camera also usually has a mechanical shutter while our vision is simply sampled through the optic nerve.
But there are some important differences.
The primary objective of this article is simply to explore the similarities, highlight the differences, and correlate the terminology of each to the other.
We will start with three key physical image elements; resolution, contrast, and color.

Resolution

Resolution is the ability to recognize minute details as separate image elements.
In the eye, this is limited by the density of rods and cones on the retina and a small central area called the fovea.
With film resolution is limited by chemical grains on the film substrate.
With digital sensors it is defined by the size of the photo receptor sites or sensors.
Within the retina these receptors vary greatly in size and density.
The density is much higher in the fovea, the center of the macula and the center of our central vision, than it is in other areas of the retina that make up our peripheral vision.
With film, these sensors are somewhat randomly distributed but with a fairly uniform density.
With digital sensors they are uniformly distributed in a fixed pattern.

The human eye contains about 120 million rods and 6 million cones, channeled into 1.5 million optic nerve endings.
These basic numbers lead to some interesting but misleading pixel counts for the eye.
Rods are very sensitive to luminosity but turn off in bright light.
Cones are sensitive to color, but only in bright light.
There can be as many as 1000 rods or as few as a single cone connected to a single nerve ending (ganglion cell).
In some areas, a single rod or cone may connect to as many as five ganglion cells.
Multiple rods with different light sensitivities form a single image point.
And the color attributes of a single cone contribute to multiple image points.
In other words, the relationship between photo sites and image points is not one to one.

Our peripheral vision has a high refresh rate so it responds quickly to motion, but our central vision has higher latency so it is more acute and less responsive to motion.
Since the fovea (central vision) does not have rods, it is not sensitive to dim lights.
Astronomers know this so to observe a dim star, they use averted vision, looking slightly out of the side of their eyes.
Our central vision represents only 2% of the retina, but 50% of the visual cortex region of our brains where images are formed.
It is also important to note that there are significant differences within individuals based on age, gender, disease, and other factors.
Therefore, rod and cone photo-sensor counts do not correlate directly to resolution.
But we will look at some numbers later.

Contrast

Contrast is a measured distance between tones in an image that allows a viewer to differentiate unique tones.
Taken together, resolution and contrast determine the sharpness of an image.
The full range between maximum black and white is typically called the dynamic range.
The difference between two adjacent tones is usually called local contrast.
Measuring these can be tricky because some refer to the contrast range, some to decibels (dB), and some to exposure values (EV).

The eye has an amazing ability to adapt to high or low levels of light.
But even the eye cannot adapt to total darkness and direct sunlight at the same time.
And, whatever numbers are quoted, the only thing very clear is that the dynamic range of our night vision is as much as 600 times that of our daylight vision.
The following table illustrates this (approximately).

Contrast Ratio

EV Ratio

dB

Night vision

10,000,000:1

27

160

Daylight vision

15,000:1

13

80

By the same token, each film type, each paper type, and each digital sensor has a unique characteristic dynamic range.
By manipulating ISO sensitivity we can adapt film or a digital sensor to different levels of light just as with our night vision and daylight vision.
We measure these light levels as exposure values (EV) or stops.
In nature they vary roughly from -6 EV (black night) to +22 EV (direct sunlight).
Reflective surfaces such as paper are typically constrained to a range of 8 EV.
Film ranges are typically 8-11 EV.
Digital sensor ranges are typically 8-12 EV.
For the record, a 12 EV difference equates to about 72 dB or a contrast ratio of 4000:1.

Color

Color is a psychological response to a physical stimulus provided by three different types of cones (RGB).
We can measure the physics (spectral response) of various colors but our perception and the names we assign to these colors will always remain subjective.
Our visual systems can adapt to the spectral colors in the light source as well as the spectral response of the object’s color.
This is possible because the cones in our central vision area are covered by a yellow filter called the macula.
Elsewhere in the retina (peripheral vision) they are not.
This additional information allows our brains to adapt our perception of colors to various light sources.
In digital or photographic imaging we call this white balance.

It is important to note that in human vision these adaptations to color and to light intensity are not instantaneous.
Dark to bright can take a minute, bright to dark can take 10 to 30 minutes.
Likewise, these adaptations are more computationally intensive in the digital imaging functions.

Perception

Another difference is that the image plane on our retina is semi-spherical, not flat as it is with film or digital photography.
The small area covered by the macula determines the angle of view of our central vision.
Within this an even smaller area, the fovea, determines how we focus.
The rest of the retinal area determines the much larger angle of view of our peripheral vision.
And we use two eyes to form a single image.
In general our peripheral vision is less sharp and less color sensitive.
But our brains are very sensitive to distance and motion in this area.
We select camera lenses based on their relationship to the angle of view based on our central vision.
We call them wide angle, normal, or telephoto based on the angle of view in the captured image.
We sometimes stitch or combine multiple digital images together to accomplish impressively wide angle views called panoramas.
In a similar fashion, the brain uses information from each eye to expand our peripheral vision, add depth, and fill in the blind spot at the optic nerve of the other.

Precisely measuring the normal angle of view is difficult.
It is usually based on the horizontal or diagonal angle of view.
For normal central vision, this will be approximately 50 degrees, but our peripheral vision expands this to 120 degrees.
The focal area (under the fovea) is less than 2 degrees.
For a camera, the angle of view is a function of both the lens focal length and the size of the image plane.
So the proper lens will vary with the image format.
A simple test is to look at the image in the camera’s viewfinder and again with the unaided eye.
The normal lens will produce a similar perspective or field of view.
Just for reference, the focal length of the lens and cornea is about 22 mm.

This retina’s spherical image plane also means that our vision is less prone to spatial distortions that can occur with some camera lens systems, particularly wide angle lenses.
But the visual cortex has to learn to recognize shapes and details in the first several months of infancy.
And, we are not completely immune to optical distortions.
Simple evidence of this is found in the design of Roman columns.

Some Measurements

Just for fun, here are some more miscellaneous factoids: Rods vary in size from 1 to 5.5 µm in diameter and cones can vary from 1 to 10 µm in diameter.
Typical camera photosites vary from 5 to 12 µm in diameter.
The retina’s diameter is about 25 mm.
The macula area is only about 7 mm wide. At the middle of this the central fovea is a minuscule 1.5 mm.
Thus maximum resolution and sharpness is concentrated in a very small area of our central vision.
At the fovea’s center all of the photoreceptors are cones, there are no rods.
The area of maximum rod density is a ring about 10 mm across surrounding the macula.
These densities are shown in the illustration below.
But as mentioned already, they do not correlate to image forming pixels.

This figure would imply a peak density of 160,000 mm2 (10,000 PPI).
Another way to look at this would be to compare image forming neural sensors to some typical DSLR cameras and a printed image, as shown below.

Rods

Cones

Image Sensors

Density mm2

Density PPI

Retina

120,000,000

6,000,000

1,500,000

1,061

827

Fovea/Macula

20,000

115,000

115,000

11,937

2,775

Fovea/Center

0

25,000

25,000

14,147

3,021

Nikokn D40 (DX)

6,016,000

16,272

3,240

Nikon D300 (DX)

12,169,344

32,498

4,579

Nikon D3 (FF)

12,081,312

14,042

3,010

Print 8x10

7,200,000

140

300

Of course, unlike our vision, we also want resolution to be uniform in all areas of a photographic image.
That is, unless we want to emphasize depth or subject separation for artistic effect.
If we want larger prints we can capture with larger image formats, just as with film.
Or we can use digital tools to stitch multiple images together in a single print.
So pure resolution comparisons can be misleading, not only are the technologies different, the objectives are different.

The Rest

In the world of digital imaging we can attempt to simulate our visual adaptation with special tools and techniques such as High Dynamic Range (HDR).
Basically these expand the contrast in both the shadows and the highlights while compressing contrast in the mid tones.
The result is that we can reproduce details in both the shadows and highlights that would be difficult to see with our unaided eyes in a single view.
In other words, the image is more aesthetically pleasing than technically accurate.

Advanced spectral imaging for digital images is currently only being used in highly specialized scientific equipment and applications.
But it may hold promise for future photographic applications.
Obviously, depth and motion are also more advanced photographic imaging topics.
Some examples are three dimensional systems such as stereo viewfinders and lenticular prints.
And motion can be incorporated through video systems or animation graphics.

This is not an in-depth coverage of either photographic imaging or human vision, but I hope that it helps to illustrate both the similarities and the differences.
It is misleading to compare the 1.2 million nerve endings in a typical human eye to a 1.2 MP digital camera.
It is just as misleading to compare the 126 million rods and cones to any digital camera metrics.
Both are fascinating examples of technological excellence.

I hope you also gained some new insight from this article.
If you have any comments, or suggestions, I would welcome your input.
Please send me an
Email