NWAPW Year 2

Technical Topics

Part 4: Hardware Considerations

Image Requirements

There are two kinds of patterns we might be trying to identify, those with
features larger than twice the pixel size of the image (and therefore can
be resolved) and those smaller (or slightly larger, but not aligned), which
will be detectable only as a lumpy blur. These can be reduced uniformly
to the latter category by application of Gaussian blur. So basically, all
patterns (except for pedestrians too close to do anything about) can be
treated as solid near-gray colors.

Much more difficult is recognizing unsaturated colors. Everything in
most scenes -- except for a few red sports cars or lemon yellow chick cars
and the occasional fire engine or school bus -- why do you think they chose
those colors? -- everything is off-gray or off-white, or dirty black (or
shiny black, reflecting the scene around it, with the result that it looks
the same as dirty black) which comes off as dark gray inside the camera.

So if we restrict our first attempt to saturated colors, a blob of color
4-5 pixels square is sufficient to distinguish it from digital noise and
small things like flowers and birds.

Assuming a fixed-focus lens (so pedestrian size in the image can give
a reliable distance estimate, if we so choose) of normal field of view
-- perhaps the equivalent of 50mm lens on a 35mm camera, it is not hard
to calculate that a six-foot (2m) pedestrian at 50 meters would be 2mm
on the film of that 35mm camera, and correspondingly 0.7mm on a 1/3" (8mm)
standard C-mount video camera sensor chip. His shirt is a little less than
half that, in round numbers 0.3mm or 300 microns. If the 8mm-diagonal chip
resolves 320x240 color pixels (so the pixel size is about 20 microns square),
that shirt image on the sensor chip is about 15 pixels square at 150 feet
(50m).

Why 50 meters? According to the Oregon State Driver's instruction booklet,
a car going 20mph (=10m/s, the standard downtown speed limit in Oregon)
needs 65 feet (20m) to stop, so detecting him at 50m gives a 2x margin
of safety. That's 15 pixels square; at 150m he is only 5 pixels square,
which we guessed is the minimum for detection, another 3x margin of safety.

Frame Rate

The nominal pedestrian walking speed is 3mph, or about 1.5m/s. If your
camera produces (and your software processes) one frame every second, the
pedestrian has walked two feet between frames, which (other than very corpulent
pedestrians: are there any? Such people get tired too quickly to do much
walking) the guy is already more than his own width away, which makes tracking
a single pedestrian very difficult. At 100fps he has moved only 1.5cm,
which is less than the resolution of the camera. 10fps (15cm = 6" per frame)
still offers plenty of overlap between frames.

A car driving the nominal Oregon downtown speed limit (10m/s), at a
frame rate of 10fps, moves one meter every second, which gives you 20 seconds
to get the car stopped in the distance we are told we need to do so.

The numbers are credible and consistent.

The Camera

Commercial low-cost digital surveilance cameras typically come in resolutions
of 320x240 and 640x480 color pixels -- be careful: the vendors inflate
the numbers by telling you the monochrome sensor density, but you must
divide by 4 (= half each way) to get the true color pixel density. There
are some factors related to the electronics and the physics of the sensors
for how fast you can get images off the sensors, but they all promise 15fps,
or 30fps under careful management (according to one vendor, USB2 cannot
transfer the data fast enough to do 640x480 = 300K color pixels at 30fps,
but USB3 can).

The PointGrey (now a division of FLIR) Firefly (320x240) and Chameleon
(640x480) cameras both work with their FlyCapture2 API and driver software
on Windows10 (and also on Linux). The API is defined for C/C++/C# but not
Java. I wrote a Java wrapper class to encapsulate the API calls necessary
to start the camera and capture frames at 15fps (or 30fps, if you can handle
the data rate). This has been tested and works reasonably well at 15fps
on a 2.4GHz Win10 computer with ample time for processing 320x240 images
in Java. The (wrapper class + DLL + test code) you can download
the zip file here. If you (meaning your browser) know the secret password,
it's also available on GitHub.

Both
cameras have encoding firmware for a variety of popular image compression
formats, but my wrapper class delivers the data in the native raw (unprocessed)
Bayer8 encoding, where each color pixel must be extracted from four (non-contiguous)
sensor data bytes. There is example code included with the wrapper class
code. You can also look at their
API information directly, to better understand the wrapper class.

Test Video

There
is a brief segment of a pedestrian walking past my house included with
the zip
download; the whole take is available here,
but it's kind of boring. I adapted a segment from the Tesla video, hand-painted
the shirt of one of the pedestrians a bright blue and converted it to my
By8 format, which you can download here. We
really need some better test clips, with actors dressed in bright colors,
walking across the street in front of a moving car with the camera attached
to the windshield. Hopefully we can get some of you (or your families)
involved.

Some additional (longer) clips, all in my By8 format (see downloads
above) of a single person in a bright shirt, walking
across the street: