Gary Bradski's Last Tip! Contributing to OpenCV's Future, 10 of 10

Computer vision is everywhere--in security systems, manufacturing inspection systems, medical image analysis, unnamed aerial vehicles and more. Indeed, getting machines to see is a challenging and often entertaining goal. Dr. Gary Rost Bradski and Adrian Kaehler, the creators of OpenCV, have put their knowledge into a new book for O'Reilly. With Learning OpenCV: Computer Vision with the OpenCV Library developers and hobbyists can learn how to build simple or sophisticated vision applications.

Over the last week Gary--a consulting professor at Stanford, senior scientist at Willow Garage, a robotics institute research institute/incubator, and vision team leader for Stanley, the Stanford robot that won the DARPA Grand Challenge autonomous race across the desert--shared his Top Ten Tips and Tricks for getting the most out of OpenCV. And here's his last one! Thank you Gary.

Tip #10: Contributing to OpenCV's Future

Chapter 14 in Learning OpenCV discusses the future of OpenCV. Some of that future is coming from:

- the authors,
- the group of five full time developers in Russia my company Willow Garage is supporting. You can also see our group meeting notes any time at: http://pr.willowgarage.com/wiki/OpenCV.
- the core group of already active OpenCV developers.
- And some of it can can come from you.

Look for OpenCV to increase it's presence at the major computer vision conferences and at NIPS in the future. Contributing code will be easier. We will set up a user area where people can put unmaintained code or applications. The really good, general new functionality will be moved into OpenCV. If you have a new algorithm or application. For now, you can package it up with some documentation and a simple example program of its use. Email it, or your idea to OpenCVcode@gmail.com.

OpenCV is growing at a critical time for the development of robotics and the Web. Look for rapid advances.

Our book, Learning OpenCV Computer Vision with the OpenCV Library, will get you booted up fast. It is:
(1) A tutorial covering a wide range of computer vision
(2) A user's guide to OpenCV
(3) A great source of code examples for using OpenCV and vision
(4) A detailed guide to OpenCV's functions and structures

We look forward to hearing from you.

Tip #9: Prospects for Projects

OK, you're teaching or taking a computer vision course. You grab Learning OpenCV. It's (IAHO In the Authors Humble Opinion) helpful right off the bat because it gives you the gist of some of the hard algorithms, making it much easier to understand and remember those algorithms when you have to learn them from original papers. But now it's course project time. There are 6 to 8 weeks remaining. What projects can you leverage using Learning OpenCV? Here are a few examples:

- Use the above Random Forest algorithm to invent and study new features

- Using cellphone cameras plus GPS to do image stitching
Use GPS location and feature points detectors such as Harris corners along with, say histogram or contour descriptors of those feature points to stitch images you take with a cell phone (for example the iPhone) together at a given geographic location. (This is project that could be sold as a business). To get ideas about stitching, go to Richard Szeliski's site http://research.microsoft.com/~szeliski/

- Stereo correspondence evaluation
Take the stereo correspondence code in OpenCV, evaluate it on the datasets at http://vision.middlebury.edu/stereo/
Make improvements using post processing and re-evaluate it.

- Optical flow evaluation
Start with the Lukas Kanade in Pyramid optical flow code in OpenCV.
Evaluate it on the data at: http://vision.middlebury.edu/flow/ Improve the results by better filtering.

- Video content summarizing
The goal is to grab scenes from videos and put them together into a collage in such a way as to intuitively summarize the video in one image. You can for example use color and lighting histograms to determine scene changes and scene returns. Scenes with longer time periods can be shown as a larger sub-image in the summarizing collage and quick scenes can be shown as smaller image in the final collage etc.

- Cheap flying vehicle navigation work
Take sequential aerial imagery from MS maps and attempt to "fly" the same course in Google Map imagery.

- Visual Radio control
Using flashing LEDs on a radio controlled airplane, helicopter or car, get several stereo cameras (using OpenCV) and automatically control these.

- Life logging
Get a flip video or just a canon camera with video and wear it recording video over several days time. Use histograms, optical flow, machine learning to summarize and index this video so that you can retrieve things from it later.

I could go on and on. Such projects can make for fun exercises, great class projects, research papers, and even as businesses.

Tip #8: Perceiving is deceiving, illusions and the nature of vision

Optical illusions are cool and sometimes just stunning. Change blindness is one of the most stunning. If it is done right, you can watch a video of a street in San Francisco and not notice that the SUV in front of you changes to a sports car, or simply disappears. In change blindness, something fundamental changes in a scene and you simply fail to notice it. Try out some examples at http://www.psych.ubc.ca/~rensink/flicker/download/

Close your eyes until these files get going otherwise startup delays may ruin the effect. In these scenes, something, often large, changes and you just don't notice it. The trick in most of these videos is that a gray frame is inserted between each frame of the video. The gray frame causes your motion detection system to go nuts and thus your attention system just goes "huh?" and then the next frame of video comes on and you didn't notice the change. Many richer examples of this are on YouTube.

What's going on? Basically your brain just isn't big enough to process the whole world so it cheats. You focus on one thing in a scene and keep just a vague sketch of the rest. When motion triggers your attention system that something may have changed, you turn your focus to what changed. You don't keep the world in your head, you mostly look up the information when you need it, similar to how google is used now on the web. Well, people can use that cheap computational trick of your brain to fool you such as when you are watching a skilled magician.

There are many kinds of illusions, each based on violating an assumption the visual system uses to deal with the normal 3D world. The most common kind of illusion is the last bullet item above--we have strong built in models for dealing with the 3D world and these are relatively easy to fool using 2D images.

What does this have to do with computer vision? It tells us for one, that for object recognition and understanding, we're going to have to look harder at using 3D biases and models in order to achieve robust performance.

Tip #4: Show me the money
Computer vision is fun and challenging no doubt--but how do you turn that string of numbers you get from the imaging arrays into identification of objects and actions? How do you turn "seeing" into "perception?" Fun, but is there any money in it? That is, can you get a job in computer vision?

Well, much to my surprise, the answer was "yes" when I got out of school and has only gotten better since. It turns out that machine perception is really valuable ... and it's really hard. So, what is computer vision used for commercially?

Listed under each of these categories are companies that work in that area. This list is far from comprehensive. In my own area of Silicon Valley, there are two video search startups that I can think of immediately (that is, without having to Google since I (Gary) have done some work with them): Zeitera, which makes a content protection product (finds copies of videos posted on the internet), and VideoSurf, which is just releasing the beta version of it's search engine for videos.

There a many more companies using vision computer vision, for instance the company I'm in, Willow Garage is mounting 2 stereo cameras on its robots' head, 2 pan and tilt cameras on the robot shoulders. Two cameras, one in each elbow of the robot and 2 laser ranger finders. And similarly, computer vision is one of the areas of expertise at Applied Minds where my coauthor works. Google is making extensive use of computer vision in to stitch satelight maps together in google maps, street scenes together in streetview, scan the 3D surface of books for their book scanning project, find faces and do image processing in Picasa etc. I know of at least two cell phone startups that are making use of computer vision using the cell phone camera.

Vision is rapidly growing--few people are aware that virtually every new manufactured product that is made or extracted has cameras involved in monitoring it. In any case, the vision described in Learning OpenCV is not just a hobby these days.

Tip #5: Leo Breiman allows us to tell what data is important
I'm sorry to report that Leo Breiman died of cancer some years back. He was a really great guy who was one of the key inventors of Decision Trees (Classification and Regression Trees). The others still living are Trevor Hastie, Rob Tibshirani and Jerome Friedman. All great statisticians (one of the best books on statistical learning is "The Elements of Statistical Learning" http://www-stat.stanford.edu/~tibs/ElemStatLearn/). Anyhow, Leo Breiman went on to invent this great algorithm, Random Forests. You can see it described on his still active website: http://www.stat.berkeley.edu/users/breiman/. Well, OpenCV honors Leo by also implementing his decision trees and random trees algorithm [we had to change the names because the names are owned by Salford systems a data analysis company http://salford-systems.com/ ]). Decision Trees and Random Trees are described in the machine learning section of Learning OpenCV, Chapter 13.

Both these algorithms offer one very important and often overlooked functionality: Variable importance. That is, you can throw datasets that have way too many features at either Random Trees or Decision Trees in OpenCV, run variable importance and it will rank how useful each feature was for predicting the class or regression level. You then get rid of the unimportant features and your algorithm gets smaller, faster and cleaner. You can also use this to learn something about the structure of the problem. We implemented this after being inspired by Looking Inside the Black Box on Leo Brieman's website http://www.stat.berkeley.edu/users/breiman/

See Page 465 and the "variable importance" in the index (p 555) for a discussion of how to use and make use of this in the book.

Tip #6: Some ordinary apps, we will donate
As a present to the book, in the next month or so, the authors will be donating some code:

Adrian will be donating a version of "multi-win" that displays multiple windows in one window, allows some basic data graphing, etc. Absolutely great for debugging your vision apps, Stanley would not have "seen" without this utility to debug what the laser's were telling the camera, and what the camera was doing with this.

Gary will be donating a data collection system based on background segmentation and watershed segmentation. Basically you learn a background, throw in an object. This segments it and stores: the object mask, the object bounding box and the object view into a labeled directory. It also works on ROS, the robot OS where we use it to store the 3D point cloud of an object as well.

Look for these ... when we can get ourselves to release this. Sheesh.

Tip #7: OpenCV and the Robot Operating System (ROS)
Did I mention that I (Gary) work for Willow Garage? The robot research institute/incubator that is seeking to become the Linux of Robotics? The goal is to make the robotics platform (the "horizonal" of robotics) free and open, and then develop verticals (construction robots, household robots, robot political leaders, service robots, robot overlords etc) for money fame and prizes. OK, we don't do any military stuff ... because, well, people are already pretty good at being bad and don't need our help.

Anyhow, Willow Garage is supporting OpenCV development, but also piling a ton of effort into a Robot Operating System (ROS)--(for purists, it isn't really an "OS", it's a bit transport layer so you can work across cores, across clusters slinging sensor and control bits around, keeping everything timed, safe, shiny, and robotical). ROS is graph based, nodes produce, publish, read and process data, working over distributed sensors, computers and multi-cores. ROS comes with serious goodies: Physics and Graphics simulators, Control and Planning software done by the best minds, Mapping, Localization and Navigation, Grasping and Manipulation. Oh, and OpenCV too for perception. The bit transport stuff stays under "ros", the goodies go under "ros-pkg."

Willow Garge will be shipping (what I humbly believe) will be the world's best research robot, which we call "Personal Robot 2" (PR2, see it at http://pr.willowgarage.com/hardware.html ), a vast improvement from PR1 that was created for Andrew Ng's ongoing STAIR project at Stanford http://stair.stanford.edu/. But ROS is not tied to any robot, it is general and already runs on 3 very differnt kinds of robots.

ROS is still in alpha development, but you can get it now, become a developer or robotic God (NOTE: just the opposite from humans, in the far future the robots who are scientists will insist that robots were intelligently designed, it is the religious robots who will insist that robots evolved. This is your chance to be one of those intelligent designer/Gods. Sign up now!). The ROS development page with tutorials, download and install instructions is at http://pr.willowgarage.com/wiki/ROS

OpenCV will be at the core of the perception systems for ROS, stay tuned.

Tip #8: OpenCV behind the scenes.
A new official release, 1.1 of OpenCV, is coming out in October 2008. Some people have noted that the last official release of OpenCV was in Nov. of 2006. OK, we're sorry, but the authors and main architects took some time out to join startups, found startups, join other companies, have some children, stuff like that. We were busy! Development never stopped--the CVS system on sourceforge has continuously been updated, it's just that the higher standard of an official release was more work than any of us had time to do. But do not fear! Thanks to the generosity Willow Garage www.willowgarage.com where I work, we will be supporting five OpenCV developers full time and so regular releases are coming starting in October! More than that, look for many new features in OpenCV in the coming months (note, these are not yet in but are under active development now):

Laser-camera calibration. Hey, both authors worked on the robot Stanley that won the $2M Darpa Grand Challenge 7 hour race across the desert (look for us under the "Team" tag in the "Software" group, "computer vision" sub-group at http://cs.stanford.edu/group/roadrunner//old/index.html ), so we're both interested in using depth sensing with cameras.

Space-Time stereo: Get very precise stereo reconstruction using random patterns of projected light over time. Great new work by Federico Tombari coming soon. For getting really good close range depth, this technique is great.

3D 3D 3D. Now that we have great stereo, we need to make use of it. Look for complicated mathy things like "Bundle adjustment" and 3D to 3D point mappings.

Visual SLAM. SLAM stands for: Simultaneous Localization and Update. It's how robots make maps by wandering around and then make use of those maps to navigate without hitting things. See examples of such maps at http://pr.willowgarage.com/wiki/Maps. These were made with Laser. Well, renowned stereo expert Kurt Konolige is going to do the same, but in full 3D for stereo. Look for this several months from now.

Rapidly learn huge data sets -- it's best if we can parallelize learning and recognition to take advantage of all those new multi-core chips.

The learned classifier has to remain computationally efficient. If we go from 10 to 1000 objects, we don't to take 100 times as long to recognize objects for example.

We would like to incrementally add new objects. That is, we don't want to have to relearn the whole database for each new objects we need to learn (many techniques suffer from this problem).

We would like to incrementally add new features. If we come up with a new feature, we want to add it without having to train everything from scratch. OK, this is hard and involves real research -- we're pursuing it with Stanford at the moment. The other powerhouse in this area is at MIT with Bill Freeman and Antonio Torralba.

OK, so what are the authors doing now besides trying to overcome carpal tunnel syndrome from writing the book?

Gary Bradski, is spending time jointly between Stanford and Willow Garage www.willowgarage.com, a robotics institute/incubator that is seeking to become the Linux of robotics and if funded to accomplish this task after which we'll spin off companies doing vertical applications like food service etc. It's an absolutely great environment, trust me, I'm reeeeally choosy. Yes, we're hiring computer vision programmers -- but really good programmers who would blow away a Google programming interview. We probably interview harder, but don't worry, we know you might still be great even if you "fail" -- one of our best employees is someone we unanimously rejected and only after being berated by his adviser did we give him a chance and now worship at his very coding feet. So, we're looking for fit and fitness, we know we're often wrong but hiring is harder than interviewing ... we apologize in advance, give us a try.

Adrian Kaehler: Is an uber guy at Applied Minds. www.appliedminds.com/ Remember the guy who created the connection machine? Then did great stuff for Disney imagineers and is also building a 10,000 year clock? Danny Hillis is one of he founders. A lot of their stuff is secret, but if you want to work on proteomics, multi-touch displays, robotics, satelights, movie animation, building architecture and billion pixel cameras all in the same job. This is your place. They are hiring too and we often fight over people and sometime send them back and forth (you have to be a US Citizen for Applied Minds).

For the CVS version of OpenCV, detail install instructions for openCV on Windows, Mac OS and Linux are in Chapter 1 of the book starting on page 8. Also see the OpenCV wiki at http://opencvlibrary.sourceforge.net . (Please note: The opencv wiki is down due to a sourceforge changeover.)

Tip #9: Google's Vision for OpenCV
Google makes extensive use of OpenCV internally for their street view, map stitching and other image processing needs. They have recently contributed some of their code back to OpenCV in the form of Daniel Filip's C++ image wrapper class cvwimage. Look for cvwimage.h in the .../cxcore/include directory. This code came too late to be documented in the book, but we will document it on the website after we ourselves get more familiar with it. Google is interested in speed and minimizing bugs, so this class wraps IplImage and concentrates on these things:

1. Images are explicitly owned to avoid memory leak problems.2. This class provides fast access to subregions of an image, especially lines. The member functions are only things that are very fast. To call most OpenCV functionality, you expose the pointer to the image and call the OpenCV functions as usual. That means, there is little learning curve with this class.3. You can derive window pointers to sub regions of the original image.

Typically in Google, they allocate a huge image space and then put all their images that they are processing into sub-regions of this huge window (see the Region of Interest "ROI" discussion starting on the bottom of page 43 in the book). This huge window becomes their processing buffer and sub-regions are allocated and passed to OpenCV or Google functions for processing. Memory managment isn't much of a problem because only one routine "owns" the huge window so it's easy to manage when this is allocated or deallocated.

Tip #10: Now that OpenCV has a full stereo depth perception pipeline built in (see Ch 12 in the book), what can you do with it?

Well, it makes state of the art stereo cheap. You can now put together different kinds of cameras and as long as they are time synchronized or as long as you can deal with "close enough" to time synchronized (that is looking at far distances), you can pair cheap web cameras together on up to expensive high resolution or high dynamic range cameras together and get out images plus depth maps. Wave a black and white chessboard pattern at your cameras and produce state of the art stereo depth perception. There are dozens of applications for this.

One area is security. Forget single cameras, go to stereo pairs. You can now get superior background segmentation results that are invariant to lighting. You do this by relating each pixel in one camera to its pair in the other via the depth or dispaity map, this becomes your two camera background scene model. The model captures the relative red,green,blue ratio between each pixel in two views. When someone or something moves into the scene, the model, based on the true depth, becomes distorted and you can not only identify a change in the scene, but know where that change is in 3D. See for example: