Chapter 2. RECORDING TECHNIQUES AND ANIMATION HARDWARE

Animation can either be generated in real-time or in single-frame
mode. Real-time implies that the images are being generated
at a fast enough rate to produce the perception of persistence of
motion. For general purposes, the rate capable of producing this
perception is usually taken to be 1/24th of a second; the actual
rate depends on the types of images being viewed and on the
specific viewing conditions. Unfortunately, it's not uncommon to
hear people speak of real-time when referring to rates as low as
five frames a second.

If the imagery cannot be produced at a fast enough rate to
provide real-time animation, then it can be generated a single
frame at a time and each frame can be recorded on some medium so
that it can be played back at animation rates later (i.e., rates
fast enough to produce persistence of motion). The difference
between real-time animation and single-frame
animation is dependent on image quality, the
computational complexity of the motion, and the power of the
hardware that is being used to calculate the motion and render
the images. Model-based motion control algorithms (e.g.
computational fluid dynamics) can require processing that is too
intense to be done in real-time. Sometimes it is possible to
pre-compute the motion and then render in real-time.

The reader should note a distinction here in the two
processes taking place: motion control and rendering.
In one possible scenario, the motion control might be
sophisticated simulations of physical processes that brings a
supercomputer to its knees. But at the end of the calculation, a
series of transformations for objects over multiple times is
produced which the display hardware can read in and render in
real time. At the other extreme, simple motion control such as
linear interpolation, may be used in conjunction with a software
ray tracer for the rendering. In this case, the motion
calculation may be able to run at real-time rates but the
renderer can't produce images in real-time.

Another distinction to be made is between playback
rate and update rate. The playback rate
is the rate at which frames are displayed on the display device.
The update rate is the rate at which the motion is computed. For
example, common with some cartoons, the update rate may be as low
as 8 frames a second even though the playback rate is 30 frames a
second. Similarly, with interlaced scan (explained below), the
update rate may be 60 times a second while the playback rate is
30 frames a second.

The first computer animation used film to record the single
frames. Single frame film technology has been around for a long
time; it had been used in the late 1800s to produce some of the
first film animation. The requirements are a film recorder that
is capable of precise positioning of a single frame of film over
the shutter in a static situation. If the positioning is not
precise enough, then the image will jitter or float when played
back. For each frame, the image is displayed and the shutter is
opened and then closed during which time the film is exposed to
the image. The film can then be advanced to the next frame and
the next image displayed in preparation of the next frame
exposure. The opening and closing of the shutter can be done
manually, although this is a very human labor intensive
operation. It is facilitated greatly if there is a computer
interface to such a camera. This technology had been developed
for traditional single frame animation (stop motion animation).
One advantage of film technology is the resolution of the medium
itself. The emulsion that coats the film celluloid can capture
very high frequency image components.

The early film medium was 16mm film. The
problem with this technology was the standard use of 16 frames
per second playback rate which often is not fast enough to avoid
flicker. The 35mm standard that employs 24
frames per second playback rate results in a much more stable
image and has been used for most of the computer animation
captured on film. For very high quality animation, a 70mm
standard is used with a playback rate of 24 fps. 70mm usually
requires that images be calculated at least at 2000 by 2000
resolution which drives up the cost of computer generated
animation.

An easy way to transfer an image onto film is to position a
camera in front of the screen, plot an image on the computer
screen, open the camera shutter, close the camera shutter,
advance the film to the next frame and repeat the process.
Drawbacks of this approach include difficulties in eliminating
extraneous light, the curvature of the screen, color shifting,
and mechanical difficulties in maintaining a stable device.
Another thing to keep in mind is that the image on the computer
screen is not a static image, but one that is continuously being
drawn, decaying and being redrawn (refreshed) on the screen, even
if it is a static image. Therefore it is important that the
camera shutter be open for several refreshes of the computer
screen so that a solid image is recorded on the film or that it
be precisely coordinated with the screen refresh so that an
entire single refresh is captured on film. Usually, the former
approach is taken due to its relative simplicity.

Some of the early computer animation used random vector
displays to render the frames. A color filter was placed on the
camera and that component of the image was scanned out while the
camera lens was open. The process was repeated for the various
color components (usually red, green and blue).

Currently there are many film products made specifically for
computer animation. Film recorders have a
special high-resolution, flat screen onto which the image is
drawn so as to minimize distortion of the image. On some,
mounting brackets allow either SLR cameras or single frame motion
cameras to be mounted on the unit.

Film plotters use special electronics that
'draw' an image directly onto the film without the intermediate
screen involved. Typically, these special purpose recorders and
plotters are designed as very high-resolution devices (e.g., 4000
by 4000).

Some of the main drawbacks to using film technology are 1)
the medium (film) can't be reused and 2) there is a delay between
the time the recording is done and the time it can be viewed
(developing).

The advent of video technology and the fact that it is driven
by a mass consumer market has brought it into the price range of
just about everybody. This has resulted in affordable video
single-frame recorders and controllers.

Video technology is based on a raster scan display
refreshing format. Raster scan refers to the
pattern used to scan out the image: top-to-bottom a line at a
time, left-to-right along a line. A line is called a scanline.
The image is drawn by an electron beam which strikes a
phosphor-coated screen which emits photons in the form of light.
The intensity of the electron beam is controlled by the image
being scanned out whether that image is stored in the digital
memory of a computer, or generated by the similar raster scanning
of a video camera. After a scan of an individual scanline, the
electron beam is turned off and is positioned at the beginning of
the next scanline line. The time it takes to do this is called
the vertical retrace interval and the signal which
notifies the electronics of this is called vertical blanking
or vertical sync. When the beam gets to the bottom of the
image, it is turned off and is returned to the top left of the
screen. The time is takes to do this is called the horizontal
retrace interval and the signal which notifies the
electronics of this is called horizontal blanking or horizontal
sync. A complete scan of all the scanlines of an image is
called a frame. In some video formats, all of the
scanlines are done at once (progressive scan). In other
video formats every odd-numbered scanline is done on one pass and
every even-numbered scanline is done on the next pass (interlaced
scan). In interlaced scan, each pass is called a field
(two fields per frame).

The NTSC Standard

The National Television Systems Committee (NTSC) in 1941
established 525-line, 60.00 Hz field rate, 2:1 interlaced
monochrome television in the United States. In 1953, 525-line
59.94 Hz field rate, 2:1 interlaced, composite color television
signals were established. Broadcast video must correspond to this
specific standard. The standard sets specific times for a
horizontal scanline time, a frame time, the amplitude and
duration of the vertical sync pulse, etc. Home video units
typically generate much sloppier signals and would not qualify
for broadcast. There are encoders that can strip old sync
signals, etc. off a video signal and re-encode it so that it does
correspond to broadcast quality standards. The specific pieces of
video equipment will be mentioned later in this section.

There are 525 total scanline-times per frame-time in NTSC
format. 29.97 frames are transmitted per second. There is a 2:1
interlace of the scanlines in alternate fields. Of the 525 total
raster lines, 480 contain picture information; the remainder
comprise vertical scanning overhead. The aspect ratio of a
525-line television picture is 4:3, so equal vertical and
horizontal resolution are obtained at a horizontal resolution of
480 times 4/3 or 640 pixels per scanline. PAL and SECAM
are the other two standards in use around the world. They differ
from NTSC in specifics, like the number of scanlines per frame
and the refresh rate, but both are interlaced raster formats. One
of the reasons that television technology uses interlaced
scanning is, when a camera is providing the image, the motion is
updated every field thus producing smoother motion.

Black and White Signal

A black and white video signal is basically a single line that
has the sync information and intensity signal superimposed on one
signal. The vertical and horizontal sync pulses are negative with
respect to a reference level with vertical sync being a much
longer pulse than horizontal sync. On either side of the sync
pulses are reference levels called the frontporch and back-porch.
Between horizontal sync pulses, which identify the period between
scanlines, is the active scanline interval. During the active
scanline interval, the intensity of the signal controls the
intensity of the electron beam of the monitor as it scans out the
image (see Figure X).

Color Monitors and Gamma Correction

A color monitor has three electron guns, each of which can be
focused on one of three phosphor coatings on the screen. These
phosphors are almost always some shade of red, green and blue.
One way to drive a color monitor is to have four lines going into
it: red, green, blue, and sync. Sometimes green and sync are
superimposed onto one line in which case it resembles a black and
white TV signal. In this case a monitor would have three lines
going into it: red, green/sync, and blue.

It is often the case that a doubling of the input value on
one of the lines does not result in a doubling of the light
emitted from the screen. Gamma correction is a modulation
of the input signal used to compensate for the non-linear
response of the display screen. In graphics systems this is often
done by a look-up table which converts the input value to a new
value such that a linear response is produced at the screen
output.

Incorporating Color into the B&W Signal

When color came on the scene in broadcast television, the
engineers were faced with incorporating the color information in
such a way so that black & white TVs could still display a
color signal and color TVs could still display black & white
signals. The solution was to encode color into a high-frequency
component that was superimposed on the intensity signal of the
black and white video. A reference signal for the color component
was added to the back-porch of each horizontal back-porch, called
the color burst. The color was encoded as an amplitude and phase
shift with respect to this reference signal.

A signal that has separate lines for the color signals is
referred to as a component signal. Signals such as the color TV
signal with all of the information superimposed on one line is
referred to as a composite signal.

Because of the limited room for information in the color
signal of the composite signal, the TV engineers optimized the
color information for a particular hue which they considered most
important: Caucasian skin tone. Because of that, the RGB
information had to be converted into a different color space:
YIQ. Y is luminance and is essentially the intensity information
found in the black and white signal. It is computed as:

Y= 0.299*R + 0.587*G + 0.114*B

The YIQ television signal is similar to the CIE defined YUV
(or XYZ) color spaces in that the Y's (luminance) are the same. U
and V are color difference signals and are scaled versions of B-Y
(by .5/.866) and R-Y (by .5/.701) respectively. The I and Q
chromanence signals used in television pick up the remaining two
degrees of freedom of the UV space. I and Q are the signals used
to modulate the amplitude and phase shift of the 3.58Mz color
frequency reference signal. The phase of this chroma signal, C,
conveys a quantity related to hue, and its amplitude conveys a
quantity related to color saturation. In fact, the I and Q stand
for "in phase" and "quadrature" respectively.
The NTSC system mixes Y and C together and conveys the result on
one piece of wire. The result of this addition operation is not
theoretically reversible; the process of separating luminance and
color often confuses one for the other (e.g., the appearance of
color patterns seen on TV shots of people wearing black and white
seersucker suits).

Video Tape Formats

Both the size of the tape, the speed of the tape and the
encoding format contribute the quality that can be supported by a
particular video format. Common tape sizes are 1/2",
3/4", 1" and 2". Up until about 15 years ago,
before 1" made its debut, 1/2" was strictly consumer
grade, 3/4" was industrial strength and 2" was
professional broadcast quality.

Current common 1/2" video formats are VHS, BetaCam,
S-VHS, and ED-Beta. VHS and BetaCam are the two consumer grade
video formats. They differ primarily on the speed of the tape
which results in how much information you can record in a single
frame; the more tape used per frame the more information that can
be stored and, therefore, the better image and/or sound can be
recorded and played back. S-VHS is a format in which the Y and C
signals are kept separate when played back, thus avoiding the
problems created when the signals are superimposed. All video
equipment actually records signals this way, but S-VHS allows the
Y signal (luminance) to be recorded at a higher than normal
resolution. The color information is recorded to the same
fidelity that it is on VHS. In addition, the sound is encoded
differently from regular VHS also resulting in greater fidelity.
The advantages of S-VHS are especially pronounced when played
back on an S-VHS compatible television.

Digital Video Formats

In addition, there are two digital formats, D1 and D2. They
were both originally 8-bit formats, but have recently been
expanded to 10-bit. Current recording devices are still 8-bit,
but supporting equipment like frame synchronizers, switchers,
etc. handles 10 bit formats.

D-1 came first and is a component format. It uses YUV coding,
so-called 4:2:2, which means that the U and V components are
horizontally sub-sampled 2-to-1. Luminance is sampled at 13.5
MHz, 720 samples per picture width. Aggregate data rate is
roughly 27 MB/s (megabytes per second). D-1 was standardized back
when the industry thought to would make the composite-analog to
component-digital transition in one fell swoop. But that didn't
happen. The cost was somewhere above $100K.

Ampex saw a niche and came up with the less expensive D-2
composite NTSC digital format (i.e., digitized NTSC). The
composite signal is sampled at four-times-color-sub-carrier,
about 14.318 Mz at one byte per sample (aggregate data rate, of
course, 14.318 MB/s). It has all the impairment of NTSC, but with
the reliability and performance of digital. It uses the same
cassette as D-1.

Animation Recording System

The main problem in producing animation is, of course,
recording the frames in a sequence so that they can be played
back as an animation sequence. There are various alternatives for
recording video in sequence. One is to record it on the video
tape directly. This requires a video recorder capable of
recording a single frame at a time and it requires a video
controller that can control the recorder based on signals it gets
from a computer. Another medium on which single frames can be
recorded are digital disks. These must be capable of real-time
conversion if and playback of the video signal.

Special purpose graphics hardware can produce real-time or
near-real time computer animation. It comes in the form of
anything from flight simulators to graphics workstations to
personal computers with built-in graphics processors. While the
term real-time is loosely defined in manufacturer's claims, true
real-time performance would produce on the order of thirty frames
of animation a second. Notice the difference between
refresh-rate, which is the number of times the image on the
display gets refreshed (typically thirty or sixty frames a second
on most displays), and animation-rate which refers to the number
of different images that can be produced and displayed. Saturday
morning cartoons have degenerated into the range of six to eight
frames of animation per second (while the TVs they are show on
still operate at the refresh rate of thirty frames a second).

Simulators are special-purpose computer graphics systems that
are designed to only produce displays of shaded imagery in
response to human manipulated controls in mock-up cockpits.
Usually most of the database is static containing few moving
objects such as planes, boats, tanks, cars, etc. A human operator
manipulates controls which the simulator samples, processes and
updates the status of the vehicle being simulated. This results
in a new viewpoint which must be used to produce new images on
the screens of the cockpit.

Graphics workstations have built-in display processors which
can typically handle tens of thousands of polygons in real-time.
However, the definition of a 'display polygon' varies from spec
to spec and can drastically impact performances statistics.
Silicon Graphics, HP, DEC and SUN are among the manufacturers
that support special graphics facilities in otherwise general
purpose workstations to enhance display performance. Applications
programs, if they are fast enough, can then produce polygon
definitions at real-time rates and pass the polygons on to the
rendering engines inside these workstations.

Some personal computers, most notably the Amiga, also have
some special-purpose graphics hardware built in to operate in the
same way. However, at this level the support is almost
exclusively for two-dimensional graphics.