For those of you interested in learning more about embedded vision, I recommend the website of the Embedded Vision Alliance, www.embedded-vision.com, which contains extensive free educational materials.

For those who want to do some easy and fun hands-on experiments with embedded vision first-hand, try the BDTI OpenCV Executable Demo Package (for Windows), available at www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/downloads/pages/introduction-computer-vision-using-op

And for those who want to start developing their own vision algorithms and applications using OpenCV, the BDTI Quick-Start OpenCV Kit (which runs under the VMware player on Windows, Mac, or Linux) makes it easy to get started: www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/downloads/pages/OpenCVVMWareImage

My understanding is that professional level three-chip cameras are great for high quality requirements, but may be too expensive for most embedded imaging applications. They are worth considering, though, to determine if the resulting quality difference is worth it.

Then there are special cameras mounted on constantly moving platforms -- satellites or aircraft -- that use a 'pushbroom' array of sensors instead of a matrix on a chip, to continuously image a ribbon path over a planet's surface. (Studied such for my Space Systems Engineering Masters degree).

Thanks, Alaskaman66. I thought it might exist (the algorithm). I also read with interest a relatively new kind of DoD radar mounted on a helicopter that supposedly could see human bodies (combatants) through brush. Acronym was "FOREST, something like that. Very expensive, and very limited availability of course. Radar Image processing of such a radar signal would be interesting. Then there is the topic of "Sensor Fusion".

Sturio color cameras get around the entire bit averaging overhead by using a set of dichroic filters that seperate out the incident light image into red, green and blue colors. This triple image is then applied to three (B&W) sensor chips. Wouldn't this be a better solution for a machine vision camera. And don't forget that the color seperation could probably be something other than RGB, or just two colors, or more than three. How would one determine from the application what the best pre-image conditioning should be?

My current project is to interface a Digilient Atlys Xilinx FPGA board with an Aptina Image Sensor that Digilent also supplies, first I would like to put frame data into DDR3 frame buffer, then display out from frame buffer to HDMI monitor. My previous attempt with the Spartan3ADSP1800 starter board and an Aptina/Micron image sensor headboard failed since the sensor did not want to talk back (no ACK) signal coming from I2C interface when analyzed with a logic analyzer, I assume the problem could by the wiring and the voltage requirements of the headboard, I should have better luck with the Vmod sensor by Digilent

Wahaufler: Removing fixed images from scenes.. This sounds like the methods used for moving target detection in look down, shoot down radar. So I think the signal processing algorithms already exist. A simple version is used in automatic asteroid tracking, where non moving point sources in three consecutive images are tossed out, and the only thing the computer identifies is whatever has moved.

What are some good references (online or books) that explain ISP blocks such as Bayer to RGB converter and/or RGB to YUV converter, enough to implement an algorithm myself? A great book that I found on Amazon Kindle: Design for Embedded Image Processing on FPGAs

In the Star Trek (TOS) pilot "The Cage", when Vina plays the Green Orion Slave Girl, the dailies kept coming back with her "not green", no matter how green they made her makeup. They couldn't figure out what was going wrong until they talked to the film processing lab. It turns out the lab thought something was going wrong in the photography or lighting so they were color-correcting Vina back to something approaching normal flesh tones. Whoops! ;-)

wahaufler: You bring up a good point: For many embedded vision applications, "automatic anything" is often the wrong choice. We often have full control over lighting, exposure, focus, etc. and anything the camera attempts to do to "automate" these is probably wrong.

In the Star Trek (TOS) pilot "The Cage", when Vina plays the Green Orion Slave Girl, the dailies kept coming back with her "not green", no matter how green they made her makeup. They couldn't figure out what was going wrong until they talked to the film processing lab. It turns out the lab thought something was going wrong in the photography or lighting so they were color-correcting Vina back to something approaching normal flesh tones. Whoops! ;-)

@Michael, whats a good kernel size for the ISP pipeline, does the size stay the same for every ISP block? for example 3x3, 5x5, 8x8 ?

That's a complicated question, and it depends on the algorithm, the sensor resolution and the optics. For example, a good Bayer demosaic needs at least 7x7, a sharpening kernel probably 3x3 or 5x5, a spatial NR kernel could be much bigger --our biggest is 128x64! It depends on the characteristic scale of the feature being processed. Obviously with high megapixel sensors, a given feature requires bigger kernels.

I'm going to open a can o worms here. As one interested in the problem of capturing images of elusive creatures -- that's right - sasquatches -- my mind goes to the problem of focussing on a furry and thus already somewhat fuzzy body behind branches which may fool the auto-focus of a camera to lose focus on the object of interest in the background. I suppose avoiding the use of auto-focus in this context is the lesson here, but is there any other kind of special processing that might be useful in this context, say to avoid losing detail on the fur in the noise reduction processing. I played with the idea of stitching images across time or space to somehow remove the foreground branches and leaves. This would require a special multi-camera setup, I suppose. Not likely in this context.

@Michael Tusch What kind of DRC algorithmes are usually used in ISP? moulay

It depends on the camera sophistication. The general term is "tone mapping", and this can mean fixed gamma, content-adaptive gamma, histogram modification or very complex algorithms which try to mimic the human vision system, such as "Retinex" and our own one we call iridix

What about correction for optical, mechanical, and pixel vignetting effects? In analog days this kind of shading error was corrected with scaled parabola signals added to the baseband video at horizontal and vertical rates. Gradient shading errors were corrected with scaled saw tooth signals added to the baseband video at horizontal and vertical rates. Feature extraction such as that required for OCR might benefit from shading correction. OCR binarization algorythms attempt to deal with adjacent pixel levels changed due desired feature brightness dealing with both noise and background brightness changes (perhaps due to background artwork). I haven't seen anything about shading correction outside of telecine/live studio video camera enviornments.

What, if anything, is done either in the camera itself or in digital post processing to correct shading errors or, indeed, is shading correction even desirable in computer vision processing?

There are some characteristic colors, like skin tones, grass etc which can be used as cues. But actually the most useful thing is see what the AWB algorithm is doing as it searches for the correct colors. This info is handy for vision algorithms

@Atlant, although of course you might want to cluster multiple physical pixels together to construct each produced pixel, for example to improve a camera's low-light performance, or to use digital zoom techniques in lieu of an optical zoom. This is what Nokia's PureView technology does, for example: www-dot-embedded-vision-dot-com/news/2012/03/05/nokias-808-pureview-technical-article-and-several-videos-you

dipert: If your demosaicing algorithm (for each pixel) is R + ( G1 G2 )/2 B, then if you're dealing with the image at teh RGB level, you *CAN'T* get the individual values for the G1 and G2 pixels; they were lost when you averaged to two green pixels.

For the next webinars: The JW Player audio plugin seems to have a problem, in that it stops streaming, and has to be manually started. Any suggestions on whether this is a JW Player issue or a browser issue?

@Atlant, I don't understand your point. The "native" color (portion of the visible spectrum) for each pixel is the color of the filter above it. It's not common (though I suppose there may be reasons for it in some cases) for further interpolation of that particular color for that particular pixel to occur. So if you throw away the remainder of the data for each pixel, which by definition has been interpolated, you can reconstruct the original RAW data.

dipert_bdti: That's only true if your Bayer filter somehow allocated only one raw pixel to each produced pixel. But if, for example, your produced pxels include a red pixel, a blue pixel, and the average of the two diagonal green pixels, you *CAN'T* recover the data from the two individual green pixels; it was lost forever.

@Alaskaman66, see here for more info on Bayer pattern sensors, as well as the Foveon alternative: www-dot-embedded-vision-dot-com/platinum-members/bdti/embedded-vision-training/documents/pages/selecting-and-designing-image-sensor-

@Alaskaman66, with a conventional Bayer pattern sensor, each pixel natively captures only one of the three primary colors (red, blue, or most commonly green). Interpolation algorithms from nearby pixels' captured data are used to calculate an approximation of the "missing" data at each pixel, thereby transforming RAW into a lossless BMP or TIFF equivalent (and assuming that no other image processing takes place, such as dynamic range expansion or compression, etc). I suppose that if you knew the the original Bayer pattern, you could throw away the interpolated "missing" data and resurrect the original RAW...

@caa028: Why "Image Signal Processors"? Is there really any "signal" processing involved - image is digitized before any processing is done...The ISP applies algorithms to sequences of pixels, very similar to how more familiar digital signal processing algoirhtms (like audio filtering) work.

@caa028: Yes, ISP = Image Signal ProcessorFrom Aptina's web site: "Digital image signal processors (ISPs) and SOCs use algorithms, or well-defined step-by-step instructions, to adjust the raw data an image sensor collects so that the processed image or video is more visually pleasing than the original. Put another way, SOCs and ISPs make the image look more like what the mind's eye sees, eliminating image blemishes, compensating for poor lighting conditions, or even correcting for a shaky hand or for bad focus." (see www-dot-aptina-dot-com/products/image_processors_soc/ (change each "-dot" to ".").

Alaskaman: Yoou can decompress losslessly, but other steps in the image pipeline have already taken steps that threw away data. E.g., even demosaicing (may) lose a certain amount of data irrecoverably because it (may) average neighboring pixels.

Another heart-rate-from-video application for iOS-based hardware is Cardiio (www-dot-cardiio-dot-com). And there are any number of conceptually similar apps (such as Azumio's Instant Heart Rate, see www-dot-embedded-vision-dot-com/news/2011/07/28/azumio-successfully-takes-pulse-investors-and-phone-users-alike) which leverage a smartphone's camera and flash for close-quarters analysis

Hi Everyone. In case anyone missed it yesterday, I'll take this opportunity to respond to a question from Monday's session. Several people asked for references on the example applications I described on slide 7 in Monday's presentation. Please note that there are multiple commercially available products in each of these categories. I'm including just one example of each here. (Please note that I have mangled the URLs since valid URLs are apparently rejected by the chat software. Replace the "-dot-" with a simple "." In each instance and you're good to go.)

Focus on Fundamentals consists of 45-minute on-line classes that cover a host of technologies. You learn without leaving the comfort of your desk. All classes are taught by subject-matter experts and all are archived. So if you can't attend live, attend at your convenience.