Friday, December 30, 2005

Popular Science has a fascinating article about Graham Flint's conversion of a World War II era spyplane camera into the highest resolution landscape camera ever - the digital equivalent of 4 Gigapixels. Yes, "Giga", as in 1,000 times the resolution of your everyday handheld digital camera.

Flint is, as the article describes, "a hobbyist photographer with an extensive knowledge of optics, physics, astronomy, and military aerial-reconnaissance cameras." With a degree in physics, an impressive career in the aerospace industry and military, and a list of hobbies that makes one wonder if he ever sleeps, the Gigapxl Project is his 'retirement project'.

The conversion of a camera system from taking pictures from high-altitude airplanes to taking them from a tripod was not just a matter of running down to the local photography shop and picking up a tripod and some lenses. Flint had to make his own compound lens, tripod, and other parts. The film itself is 9x18 inches (23x46 cm), so he has to have it processed at a lab in Ohio and then digitally scan, correct, and print it himself. However, the results are more than worth it - the images it produces are so incredibly detailed that they can be blown up to zoom in from a city skyline shot all the way into someone's window to see the poster hanging on the wall. Of course, the guys from Google Earth are interested, along with many others. Not bad for a manually focused, analog (i.e. film-based) camera. His accomplishment is a very dramatic retort to the 'CCDs are now as good as film' assertions made by the digital camera crowd.

The interview for the article occurred at the end of a photographic shoot of the USA that Flint and his wife, Aves, undertook. The article has an impressive shot that he took of the Grand Canyon as well as images of the camera itself. I highly recommend reading it if you have any interest at all in photography.

Tuesday, December 27, 2005

Physorg reports on work by the McGovern Institute for Brain Research at MIT that supports a previously held suspicion that the human brain has a separate area that specializes only in recognizing faces. They also found that another area, immediately adjacent to the facial recognition area, performs the same task for bodies (and not faces). The researchers targetted this particular region of the brain for study because of observations that people with physical damage to it lost the ability to recognize faces. The researchers used a next-generation imager with increased resolution to distinguish between the two distinct areas involved. The work was published in The Journal of Neuroscience (23 November 205).

The results of this research seem to be in agreement with the parallel tracks that machine-recognition of people in surveillance videos is taking - namely, recognition of the face and recognition of the gait. Currently, face recognition works pretty well if the entire face is clearly visible (i.e. sufficient resolution, unobscured, full face towards the camera, and well lit), but falls apart if it isn't. In uncontrolled situations, you can guess what usually happens (Murphy's Law applies here). This has led to efforts to develop new image processing algorithms that recognize a body or its actions, such as gait.

Saturday, December 24, 2005

The New Scientist reports that researchers from Aarhus University in Denmark have managed to reconstruct the sounds inside of the star Alpha Centauri B and compare them to Sol, our sun. You may wonder how stars can have sounds when, as everyone knows, 'in space no one can hear you scream.'

The reason is that the sounds never make it from inside a star out into the vacuum of space itself. The gasses churning around inside a star cause low-frequency vibrations (sounds) which travel through the star and bounce off its outer surface. The bouncing causes a star to pulse slightly. These light pulses can be measured by astronomers here on Earth using telescopes and then used to reconstruct the sounds that caused them. Besides being interesting audio recordings, the sounds can also be analyzed to reveal details about the internal structure of the star.

You can listen to some music of the spheres yourself by downloading the researchers' WAV files from Sol (our star, the Sun) and Alpha Centauri B (one of our nearest neighbors). The research was originally published in Astrophysical Journal (vol 635, p 1281).

Friday, December 23, 2005

ScienceNOW has a brief write-up about a study released in Biology Letters (19 Dec 2005) on how teams of dolphins share sonar:

when members of the rough-toothed species (Steno bredanensis) travel in tight formations, only one emits sounds to scan for prey and distant objects, while the others stay silent, listening in on the echoes that bounce back. The authors say this behavior may help the dolphins save energy as they trek.

The tag line of their write-up is "Stealing Sonar". The word 'stealing' doesn't seem appropriate in this context, but I guess it is attention getting. I tried to track down the article (without success) to see if the authors proposed other explanations for the behavior. It seems to me that the dolphins could also be trying to minimize the chances of alerting prey OR confusing themselves with multiple reflections (sort of a sonar version of the cocktail party effect where too many people talking at once makes listening difficult). However, this is not my area of expertise, so I may be completely off base.

Thursday, December 22, 2005

Most digital-age or Internet-savvy people have probably beeen exposed to some form of perceptual audio compression by now, perhaps even unknowingly. These audio compression schemes, of which MP3 is one currently popular example, shrink audio files by impressive amounts. MP3, along with similar offerings from Apple (AAC), Microsoft (WMA), Sony (ATRAC), and others, are based on the subtleties of human psychoacoustics (i.e. audio perception). In simple terms, these perceptual compression schemes throw away anything that the average human couldn’t be expected to hear clearly. Unfortunately, at every place the compression algorithm ‘cuts out’ some sound it wants to discard, a bit of noise (from the resulting discontinuity) is left behind. Fortunately, these bits of noise can then be hidden using other psychoacoustic sleight-of-hand tricks. What is left at the end is an audio file that is not only very much smaller than the original but also sounds remarkably similar.

The technical aspects aside, perceptual compression is a boon for users of portable media players - the file sizes are small (letting one carry lots of music on a small player) and the loss of acoustic detail and warmth isn't as obvious where one typically uses portable media players (e.g. on the train or while jogging). Of course, those who like their music - and still have their hearing left - have found out that run-of-the-mill MP3 music files don't sound so good when played over their hi-fi systems in the quieter environs of home. But I digress... [For more on better MP3 codecs for audiophiles, see this Wired article.]

To get back on track, what is perfectly fine for compressing a professionally recorded, mixed, and mastered music album is not necessarily fine for storing evidentiary audio that may have to be forensically filtered. Why not? The short answer is that an audio recording that needs filtering needs it precisely because the speech is not loud and clear enough, there is masking noise, or both. Perceptual compression techniques work by throwing away the sounds that the average human doesn’t hear clearly anyway, so what gets removed? You guessed it – the speech. So, when the audio examiner removes the noise, there isn't much left there to be "revealed". At that point, MP3 and other such schemes change from boon into bane.

[Note: Expect additional posts on perceptual compression in the near future due to their prevalence.]

Monday, December 19, 2005

A few days ago, I came across a newspaper article that referred to the decoding preferences of the left and right ears in humans. As usual, I lost the link and had to go hunting for it again. Although I didn't find the exact same link, I did find a different write-up that was likely about the same study. This write-up states that both adults and newborns have an inborn preference for decoding speech with the left side of the brain and tones with the right. Because of the way that the brain is 'cross-wired', this means that we are slightly better at hearing speech with the right ear and tones & music with the left.

This seems to agree with my anecdotal observations of people leaning in with their right ears to listen to other people talking in loud noise environments. The left ear is then turned away (toward the directional or diffuse noise and in the acoustic shadow of the head from the direction of the desired talker) thus giving the best noise 'reference' possible for the brain to 'filter' with.

As all professional forensic audio filters (processors) default to the left channel for the primary signal (i.e. the noisy speech input AND the filtered speech output) and to the right channel for the noise reference (i.e. music, resonance, etc.), on the basis of this report, it seems to make sense to listen to marginal recordings the opposite way around from now on (i.e. swap the left and right channels). I'll experiment myself by trying it both ways. If there are are any obvious differences, I'll report back.

(Note: Edited after the initial posting to correct some errors I caught.)

Live Science has a piece about a study published recently in Journal of Neuroscience that describes the results of high resolution imaging of the retinas of living people. The scientists used adaptive optics to compensate for having to image through the (imperfect) lenses of the eyes. Comparing the images revealed large variations in the relative numbers of green and red cones. The fact that peoples' eyes are signficantly different on the micro-level isn't too surprising a result, when you think about it, given the fact that human bodies are self-assembling. However, the idea that there is amechanism that allows humans as a group to recognize colors the same way even though our eyes are significantly different is very interesting.

[note: scientific debate is encouraged but flaming is not. I have pasted enough material around the specific sentences on eyes and ears to give context. Any opinions expressed are those of the writer being quoted. If anyone out there has references to studies that support or refute the positions stated in the quote, please post them. Keith]

Fortunately, Sax, a family physician and child psychologist, subscribes to none of the usual cant. Indeed, I thought I was a connoisseur of sex differences until I read Why Gender Matters, where I learned in the first chapter, for instance, that girls on average hear better than boys, especially higher-pitched sounds, such as the typical schoolteacher's voice, which is one little-known reason girls on average pay more attention in class.

Males and females also tend to have different kinds of eyeballs, with boys better at tracking movement and girls better at distinguishing subtle shades of colors. Presumably, these separate skills evolved when men were hunters trying to spear fleeing game and women were gatherers searching out the ripest fruit. So, today, boys want to catch fly balls and girls want to discuss whether to buy the azure or periwinkle skirt. Cognitive differences are profound and pervasive. Don't force boys to explain their feelings in great detail, Sax advises. Their brains aren't wired to make that as enjoyable a pastime as it is for girls.

Sunday, December 18, 2005

There are numerous differences between audio and video - that hardly requires saying. I would like to draw your attention to a subset of those differences and then on to a particularly significant one. The subset is how humans perceive audio and video 'data'.

When the typical person looks at an image carefully, he can extract all the information there is in an image straight away - the objects as well as their relative positions, relative sizes, colors (if color information is present), and motion (in some circumstances). If there is noise in the image, looking longer reveals little or no additional information. Likewise, when a typical person listens to audio, he can extract much of the same information (except color and similar visual-only things).

However, if the audio is noisy, additional listening can reveal more information. Humans can listen through noise, including interfering speech, by focusing their attention and adaptively 'filtering' out the noise. This is a very significant difference and it makes itself known daily to analysts, transcribers, forensic examiners, detectives, reporters, and other professionals who deal with audio and video.

Saturday, December 17, 2005

Re-reading my post about how Hollywood (mis)represents audio/video forensics led me to the subject of this posting - just how do the Hollywood types get such amazing improvements when they dramatize forensic restoration and enhancement? Well, quite frankly, they cheat! First they start with a broadcast quality image. Then, they possibly, but not necessarily, make it look more 'authentic' by dropping out the color or decreasing the resolution. Next, they dirty it up in a way that they can easily remove the dirt (noise) with the tools they have available. So, when it comes time for the actor to perform the forensic filtering, presto-chango, abracadabra, and the case is solved! What a dramatic difference it makes to take away a known, simulated noise from a broadcast quality image...

Similar to my earlier post on computer forensics, cell phones (a.k.a. "mobile phones" outside of the USA) now often contain images and/or audio recordings. On recent phones, the images may be still shots or even video. The audio may be part of a video clip or be an independent voice dictation. These pieces of information may prove critical to solving or prosecuting a case, whether it be a bombing, mugging, extortion, or some other type of crime, or investigating an accident.

Yesterday I ran across a product announcement for a cell phone data extractor that currently works in North America. The product is called CellDEK and it reportedly allows extracting the data on a cell phone, its SIM card, and its Flash Memory card. It also reportedly works with Blackberry(tm) devices (manufactured by Research in Motion). I have not tested this myself and I am therefore not endorsing or critiquing it. I have no direct or indirect ties to this company and am only reporting it as something that caught my interest.

As is often the case, the guys wearing the white hats (law enforcement, national security, and safety board types) have to stay cognizant of lots of emerging technologies. Not all of them make it to the 'big time' like Apple's Ipod(tm) but all it takes is one important case to have the white hatters scrambling to come up with the capability and procedures to recover, authenticate, restore, and enhance the data from a new type of device.

Eventhough I must admit that I watch very little television these days, I do keep my eye out for news reports concerning representations of media forensics (audio, video, computer) on such programs as CSI, The Wire, and, in its day,the X-Files. I also take in the occassional cinema blockbuster.

From these various glimpses into mass-market entertainment, two things typically catch my attention: the misrepresentation of electronic media forensics and (blatant) vendor product placements. My focus in this post is the former. I must say that I particularly enjoy the scenes where the characters take a single (still) image from a surveillance camera and zoom in on a subject's face. Of course, the subject is so far from the camera that his face takes up maybe five or six pixels horizontally and not many more vertically. On some of the most amusing examples, the illumination on the face is poor on top of it. I'll ignore, for the moment, the fact that in real-life the video they are working with is almost certainly heavily compressed.

Then, a technological miracleoccurs and through the application of digital filtering the face is a near perfect match to the suspect and the actors race for the door (or flip open their mobile phone to alert their partner, take your pick). It makes for eye catching drama, but that is not the way it happens in real life. Zooming in on a few pixels in a still image just leads to really big pixels - not a clear image - no matter what filters you apply. In my experience, minor improvements in a still image's edge contrast are possible, but not major ones. Major improvements imply recreating information that is not in the recorded image.

I might should add the caveat "using today's technology", but at the time of this writing I can't conceive of how we can ever see major improvements in restoring & enhancing low resolution still images - particularly to the level of what Hollywood (mis)represents. I hope that I'm proved wrong, of course.

(Writing this post makes me think I need to address the use of compression when storing surveillance camera video. I will in the near future).

Thursday, December 15, 2005

Although this post is not directly related to audio/video, computer forensics sometimes does involve audio and/or video because those types of media files are often found on computer storage. The content of the files may then need authentication, subject identification, and/or restoration & enhancement. Slashdot has the following thread today on a computer forensics question. Although a lot of the discussion is predictable if you do this type of thing on a regular basis, it is still an interesting technical discussion and there are some good points mixed in too.

Wednesday, December 14, 2005

The International Phoenetic Association is amending its alphabet to add a sound used in several African languages. The sound is identified as a labiodental flap and its symbol will look like a 'v' with a hook. You can read more about it in the following NY Times article

This is the initial posting to what I hope will be an interesting and informative blog on all things related to sound and light. I expect that my own contributions will primarily involve the capture, recording, playback, and restoration & enhancement (a.k.a. filtering) of audio, video, and still images. As I am involved in audio and video forensics as part of my profession, these topics are also sure to come up (in a non-commercial way).