Frequency Domain

For simplicity, assume that the image I being considered is formed by projection
from scene S (which might be a two- or three-dimensional scene, etc.).
The frequency domain is a space in which each image value at image position
F represents the amount that the intensity values in image I vary over a specific
distance related to F. In the frequency domain, changes in image position correspond
to changes in the
spatial frequency, (or the rate at which image intensity values) are changing
in the spatial domain image I.
For example, suppose that there is the value 20 at the point that represents the
frequency 0.1 (or 1 period every 10 pixels). This means that in the corresponding
spatial domain image I the intensity values vary from dark to light and back to
dark over a distance of 10 pixels, and that the contrast between the lightest and
darkest is 40 gray levels (2 times 20).
The spatial frequency domain is interesting because: 1) it may make explicit periodic
relationships in the
spatial domain, and 2) some image processing operators are more efficient
or indeed only practical when applied in the frequency domain.
In most cases, the
Fourier Transform is used to convert images from the spatial domain into
the frequency domain and vice-versa.
A related term used in this context is spatial frequency, which refers
to the (inverse of the) periodicity with which the image intensity values change.
Image features with high spatial frequency (such as edges) are those that change
greatly in intensity over short image distances.

Color Quantization

Color quantization is applied when the color information of an image is
to be reduced. The most common case is when a
24-bit color image is transformed into an
8-bit color image.
Two decisions have to be made:

which colors of the larger color set remain in the new image, and

how are the discarded colors mapped to the remaining ones.

The simplest way to transform a 24-bit color image into 8 bits is to assign 3 bits
to red and green and 2 bits to blue (blue has only 2 bits, because of the eye's
lower sensitivity to this color). This enables us to display 8 different shades
of red and green and 4 of blue. However, this method can yield only poor results.
For example, an image might contain different shades of blue which are all clustered
around a certain value such that only one shade of blue is used in the 8-bit image
and the remaining three blues are not used.
Alternatively, since 8-bit color images are displayed using a
colormap, we can assign any arbitrary color to each of the 256 8-bit values
and we can define a separate colormap for each image. This enables us perform a
color quantization adjusted to the data contained in the image. One common approach
is the popularity algorithm, which creates a
histogram of all colors and retains the 256 most frequent ones. Another
approach, known as the median-cut algorithm, yields even better results
but also needs more computation time. This technique recursively fits a box around
all colors used in the
RGB colorspace which it splits at the median value of its longest side.
The algorithm stops after 255 recursions. All colors in one box are mapped to the
centroid of this box.
All above techniques restrict the number of displayed colors to 256. A technique
of achieving additional colors is to apply a variation of half-toning used
for gray scale
images, thus increasing the color resolution at the cost of spatial resolution.
The 256 values of the colormap are divided into four sections containing 64 different
values of red, green, blue and white. As can be seen in Figure 1, a 2×2 pixel area
is grouped together to represent one composite color, each of the four pixels displays
either one of the
primary colors or white. In this way, the number of possible colors is increased
from 256 to
.

Figure 1 A 2×2 pixel area displaying one composite color.

Convolution

Convolution is a simple mathematical operation which is fundamental to many common
image processing operators. Convolution provides a way of `multiplying together'
two arrays of numbers, generally of different sizes, but of the same dimensionality,
to produce a third array of numbers of the same dimensionality. This can be used
in image processing to implement operators whose output pixel values are simple
linear combinations of certain input pixel values.
In an image processing context, one of the input arrays is normally just a graylevel
image. The second array is usually much smaller, and is also two-dimensional (although
it may be just a single pixel thick), and is known as the
kernel. Figure 1 shows an example image and kernel that we will use to illustrate
convolution.

Figure 1 An example small image (left) and kernel (right) to illustrate
convolution. The labels within each grid square are used to identify each square.

The convolution is performed by sliding the kernel over the image, generally starting
at the top left corner, so as to move the kernel through all the positions where
the kernel fits entirely within the boundaries of the image. (Note that implementations
differ in what they do at the edges of images, as explained below.) Each kernel
position corresponds to a single output pixel, the value of which is calculated
by multiplying together the kernel value and the underlying image pixel value for
each of the cells in the kernel, and then adding all these numbers together.
So, in our example, the value of the bottom right pixel in the output image will
be given by:

If the image has M rows and N columns, and the kernel has m
rows and n columns, then the size of the output image will have M
- m + 1 rows, and N - n + 1 columns.
Mathematically we can write the convolution as:

where i runs from 1 to M - m + 1 and j runs
from 1 to N - n + 1.
Note that many implementations of convolution produce a larger output image than
this because they relax the constraint that the kernel can only be moved to positions
where it fits entirely within the image. Instead, these implementations typically
slide the kernel to all positions where just the top left corner of the kernel is
within the image. Therefore the kernel `overlaps' the image on the bottom and right
edges. One advantage of this approach is that the output image is the same size
as the input image. Unfortunately, in order to calculate the output pixel values
for the bottom and right edges of the image, it is necessary to invent
input pixel values for places where the kernel extends off the end of the image.
Typically pixel values of zero are chosen for regions outside the true image, but
this can often distort the output image at these places. Therefore in general if
you are using a convolution implementation that does this, it is better to clip
the image to remove these spurious regions. Removing n - 1 pixels from
the right hand side and m - 1 pixels from the bottom will fix things.
Convolution can be used to implement many different operators, particularly spatial
filters and feature detectors. Examples include
Gaussian smoothing and the
Sobel edge detector .

Multi-spectral Images

A multi-spectral image is a collection of several monochrome images of the same
scene, each of them taken with a different sensor. Each image is referred to as
a band. A well known multi-spectral (or multi-band image) is a
RGB color image, consisting of a red, a green and a blue image, each of
them taken with a sensor sensitive to a different wavelength. In image processing,
multi-spectral images are most commonly used for Remote Sensing applications. Satellites
usually take several images from frequency bands in the visual and non-visual range.
Landsat 5, for example, produces 7 band images with the wavelength of the
bands being between 450 and 1250 nm.
All the standard single-band image processing operators can also be applied to multi-spectral
images by processing each band separately. For example, a multi-spectral image can
be edge detected
by finding the edges in each band and than
ORing the three edge images together. However, we would obtain more reliable
edges, if we associate a pixel with an edge based on its properties in all three
bands and not only in one.
To fully exploit the additional information which is contained in the multiple bands,
we should consider the images as one multi-spectral image rather than as a set of
monochrome graylevel images. For an image with k bands, we can then describe
the brightness of each pixel as a point in a k-dimensional space represented
by a vector of length k.
Special techniques exist to process multi-spectral images. For example, to
classify a pixel as belonging to one particular region, its intensities
in the different bands are said to form a feature vector describing its
location in the k-dimensional feature space. The simplest way to define
a class is to choose a upper and lower
threshold for each band, thus producing a k-dimensional `hyper-cube'
in the feature space. Only if the feature vector of a pixel points to a location
within this cube, is the pixel classified as belonging to this class. A more sophisticated
classification
method is described in the corresponding worksheet.
The disadvantage of multi-spectral images is that, since we have to process additional
data, the required computation time and memory increase significantly. However,
since the speed of the hardware will increase and the costs for memory will decrease
in the future, it can be expected that multi-spectral images will become more important
in many fields of computer vision.