Foveated imaging refers to the creation and display of static or video imagery where
the resolution varies across the image. The highest resolution region is called the
foveation region. Typically, the viewer has dynamic control of the location of the
foveation region. It is possible to have more than one foveation region.

The primary value of foveated imaging is in image
compression: high resolution information is only transmitted in the
regions of the image that are selected as important by the viewer.
Foveated imaging exploits the fact that the resolution of the human visual
system declines away from the direction of gaze; it is only necessary to
transmit fine detail in the direction of gaze; humans cannot perceive fine
detail away from the direction of gaze.

Foveated imaging can also be used to highlight regions
of an image. This can be useful because the viewer's gaze tends to
be drawn to high resolution regions of the image.

Foveated imaging can also be used in vision research. For example, it can be used
in combination with eye tracking to precisely control (in real time) the spatial
information available across the retina of the eye. This makes foveated imaging a
potentially valuable tool for analyzing the contributions of different retinal regions to
task performance.

It is a bit technical, but here is the basic idea of how the CPS foveated imaging
software works. There is a foveation encoder and a foveation decoder.

Foveation Encoder: First the image is encoded into a low-pass pyramid.
That is, the original image is low-pass filtered (slightly blurred) and then down-sampled
to create a second image with half the resolution of the original in each direction.
This half-resolution image is then low-passed filtered and down sampled to create a
third image with one quarter the resolution of the original in each direction. This
process is repeated until a low-pass pyramid of typically 5 or 6 images is obtained.
Second, regions are selected from each image in the low-pass pyramid to create a foveation
pyramid. Specifically, from the original image a region is selected for the
foveation region; from the half-resolution image a region is picked for the first ring
around the foveation region; from the quarter-resolution image a region is picked for the
second ring around foveation region; and so on. Of course, this foveation pyramid is
itself just a collection of small images. The foveation pyramid is the output of the
encoder. The user has a great deal of control over how the resolution changes across
the image, and the user can select to receive the output either as separate images or
packed into a single larger image. Typically the total number of pixels in the foveation
pyramid is far less than in the original image.

Foveation Decoder: The foveation pyramid is decoded into a displayable image.
Specifically, the low-pass images are up-sampled, interpolated and blended to
create a smoothly foveated displayable image.

The speed of encoding and decoding depends upon image size and the degree of
foveation, the smaller the image and greater the degree of foveation the faster the
encoding and decoding. For 352 x 288 24 bit color images the encoding runs at
approximately 190 frames/sec and decoding at approximately 52 frames/sec on a
400 MHz PC with YUV to RGB hardware conversion. For 704 x 576 24 bit color images,
the encoding runs at approximately 46 frames/sec and decoding at approximately 25
frames/sec.

There is not a simple answer to this question. Foveated imaging is a form of
lossy compression. Therefore, the amount of compression can be arbitrarily high
depending upon how much loss of visual quality is acceptable. The faster the fall
off in resolution from the direction of gaze the greater the compression.

Foveated imaging does not compete with other forms of compression but layers on top
of them. Our foveated imaging software is completely compatible with virtually all
other forms of image compression. Recall that the output of the foveation encoder is
a small collection of images (or a single combined image), where the total number of
pixels is a small faction of those in the original image. These output images can be
passed onto any other image encoder/decoder. For MPEG/H.263 (and our own custom
video compression software) we have demonstrated that foveation often increases the
compression by a multiplicative factor. For example, if the foveated imaging
produces a compression factor of 3 and MPEG produces a compression factor of 100 then
putting the two together produces a compression factor of approximately 300.

In most situations foveated imaging provides a substantial multiplicative increase
in compression. However, as might be expected, foveated imaging is most effective
when it eliminates data bits that would not be eliminated by the other image compression
procedure. If the only fine detail in the original image is located in the foveation
region, then obviously foveation will not provide any additional compression (nor any loss
of image quality). Similarly, if the only motion in an video sequence occurs in the
foveation region then foveation will add little compression when combined with a
compression procedure which uses motion compensation (except for the first frame and other
interframes where motion compensation is not applied).

Foveation increases the speed of subsequent image processing. This is a
simple consequence of the fact that the number of pixels in the foveation pyramid is
generally a small fraction of that in the input image. In fact, because the
foveation pyramid computes so quickly, it is generally faster to apply foveation encoding
plus another image processing procedure than to apply the other image processing procedure
alone. For example, applying foveation encoding followed by software MPEG encoding
is often several times faster than applying MPEG alone. We have developed our own
real-time software video coder (similar to MPEG) which runs at very usable frame rates,
for moderate size images and a moderate degree of foveation.

Perceptually lossless compression is possible with foveated imaging. In
general, the greater the resolution of the original image, the greater the compression
factor that can be obtained with foveated imaging, while maintaining a perceptually
lossless image. A 1024 x 768 image can typically be compressed by a factor of 3-5
without visible loss, when the foveation region is centered on the direction of gaze.

In general, the only way to guarantee perceptually lossless compression would be to
dynamically measure the gaze direction of the eyes and shift the foveation region
accordingly. Our foveation software is compatible with, and has been tested with,
several different commercial eye tracking systems.

Currently most commercial eye tracking systems are rather expensive. However,
this is not because of an inherent cost in the labor or materials, but because of the
currently small market. A larger market would drive the price down quickly.
Commercial eye trackers have been getting easier to use. There are several desk top
and helmet mounted devices that are reliable and essentially invisible to the viewer.

For either very fast or dedicated communication links the time delay is quite small
(a half frame on average) which is quite acceptable to the viewer. However, for some
communication links, such as satellite links or long-distance internet links, the time
delay could produce noticeable effects. Therefore, foveated imaging with eye
tracking is most practical for applications with dedicated communication links or with
entirely local communications, such as flight simulators and virtual reality systems.

Some of the most promising applications of foveated imaging do not require eye
tracking. Any pointing device, such as a mouse or a touch pad, can be used to
control the foveation. The list of useful applications for mouse-controlled
foveation includes video teleconferencing, video surveillance, telenavigation,
telemedicine, and image data base retrieval. Whenever there is a limited bandwidth
for communication, mouse-controlled foveated imaging provides a simple method for the
viewer to direct high resolution to regions of interest.

Consider being confronted with the choice between two nearly equivalent video
communications systems. Suppose both systems send video with equal resolution, at
the same frame rate, in a non-foveation mode. However, suppose the second system
gives the viewer the option of switching to a foveation mode where spatial resolution can
be dynamically increased in regions of interest without affecting frame rate, or
alternatively, where the frame rate can be increased while maintaining the resolution in
the regions of interest. Clearly there are many situations where this second system
would be very valuable. Which system would you want if the cost was nearly
identical?

In general, if the viewer is not looking at a foveation region, the reduced
resolution due to the foveation will be visible. However, keep in mind that the gain
to the viewer is increased resolution in the regions of interest (or increased frame
rate). Often the increased resolution (or frame rate) is much more important that
the loss of resolution away from the regions of interest. Our experience is that
mouse-controlled foveation is a smooth and natural operation for the viewer.
Furthermore, time delays in transmitting the mouse coordinates to the sender is not a
problem here, because the eye movements are uncoupled from the position of the foveation
region.

This has been done and certainly is simpler, but there are two problems:

First, without a full view, the viewer does not know where to put the window. The
effect is much like trying to find a bird with a pair of binoculars. In other words,
the user cannot easily find the regions of interest. Perceptual experiments have
shown that viewers do not function efficiently with this type of display. Such displays
are completely unusable for tasks such as remote navigation of a vehicle. It might
be possible to first find a region of interest and then switch to a windowed image, but
then other events that might be of interest are not available in the viewer's
peripheral vision (e.g., another person or object entering the camera's field of
view).

Second, simple windowing is inefficient; it does not match the resolution of the
display to the resolution of the human visual system. Matching the display to the
encoding properties of the eye is the most efficient way to allocate image data bits.

One can ask exactly the same question about any form of image compression, yet no
one questions the general need for better image compression techniques. The reason
is simply that users always want bigger, higher quality images and higher frame rates, and
bandwidth always costs something. Thus, it will always be a benefit to have better
compression. As the available bandwidth increases user demand for higher resolution
and/or higher frame rates will increase. In fact, foveated imaging becomes more and more
useful (bigger compression ratios) as the image size and resolution increase. Right
now, foveated imaging would be particularly useful for teleconferencing, surveillance or
telenavigation, using point-to-point POTS, ISDN, or wireless, as well as basic internet
communications. It can easily increase compression by a factor of 3-5 for image
sizes in the range of 320 x 240 to 640 x 480.

Our conclusion is that foveated imaging would be useful far into the future.

Foveated imaging can be very inexpensive both to use and to integrate into existing
or coming technology. Our foveated imaging software requires no special hardware,
and runs very quickly on a standard PCs using the Windows95/NT OS. The encoder takes
as input an image or a video stream and outputs an image or video stream. This
output can easily be directed to any other image coding or processing software or
hardware. For example, the output images can be directed to a hardware MPEG coder .
After transmission, the MPEG stream is decoded and sent to the foveation decoder for
display. Because the foveation coder/decoder is all software and is compatible with other
forms of video compression, the cost-benefit ratio is quite favorable.

There are several possibilities. First, you might want to provide the user
with the option of increasing frame rate without sacrificing resolution in the regions of
interest. For example, if you are running at 10 frames per second at a given
bandwidth, foveation might increase your framerate to 30 frames per second at the same
bandwidth. Many users prefer foveated video sequences with higher framerates over
non-foveated video sequences with lower framerates, and a video application could easily
allow the user to switch between foveated and non-foveated modes at the user's discretion.

Second, you might want to increase the performance and options of the current product
by allowing higher resolution images to be viewed. For example, it might be
desirable to step up to a higher resolution camera. When run in the non-foveation
mode the camera images could be transmitted at the resolution of the original camera
(i.e., the system would behave exactly as before). However, in the foveation mode,
the viewer could have access to the high resolution video information available from the
high resolution camera (without sacrificing frame rate).

Finally, you might want to enable some usage of your product at lower bandwidth.
This might occur if there is a temporary need to switch to a backup communication link, or
if a potential user could not afford the appropriate bandwidth link.

A number of methods have been explored in the past. An early method of
foveation involved increasing the size of the pixels away for the direction of gaze. A
related method is to subsample the image away from the direction of gaze. Both of
these methods suffer from a serious problem; namely, aliasing, which produces shimmering
and illusory motion in the low resolution regions of the foveated image. Variable
pixel size also produces visible blocking at the edges of the larger pixels. To
eliminate aliasing and blocking effects is it necessary to low pass filter before sampling
and to appropriately interpolate the samples when reconstructing the foveated image for
display. Our method of foveation does just this.

Many methods of foveation do not incorporate the actual fall off in resolution of the
human visual system (as measured in perception experiments). As mentioned earlier,
matching the display to the encoding properties of the eye is the most efficient way to
allocate image data bits. Matching the foveation pyramid to the fall off in
resolution of the human visual system is one of the options in our foveation software.

It is possible to match the fall off in resolution of the human visual system with
slightly greater precision. Rather than compute a foveation pyramid, a different low
pass filter can be applied at each distance from the gaze point, and the sampling can be
matched to the low-pass filter at each distance. However, this "continuous
foveation" method (which has been considered in the past) provides minimal
improvement in the foveated image quality. More importantly it is computationally
complex and intensive, and hence is not practical for real-time software, and would be
difficult to implement in real-time hardware. The foveation pyramid method is very
simple and very fast, allowing a software implementation (for small to large image sizes)
and a simple hardware implementation (for very large image sizes). Further, a
software implementation is much more portable, compatible, and upgradeable than a hardware
implementation. In other words, using the foveation pyramid method, foveated imaging
can be incorporated into a new application very quickly for very little expense.
Overall the foveation pyramid method is the best method available.