Monday, May 21, 2007

One of the main issue of trying to use compressed sensing in the radiation world other than light is the ability to focus/shepperd radiation to the right detector as is done by the DMD element in the one-pixel camera from Rice. A different way to use compressed sensing is to have a very well known material next to the detector as in the in-situ detector of Lawrence Carin, Dehong Liu, and Ya Xue at Duke University. According to this Technology Review article, the good folks at MIT are looking are refracted radiation with a mask similar to the one shown on the side. One cannot but think that a similar set-up can be used to do compressed imaging with X-rays using this type of coded-aperture imaging in the same as what is done for hyperspectral imaging in David Brady's group.

Looking back at some examples, when Andrew Ng [1] tries to build depth maps from monocular vision this is what he says when trying to fit test data from natural scene to statistical models:

We now present a second model that uses Laplacians instead of Gaussians to model the posterior distribution of the depths. Our motivation for doing so is three-fold. First, a histogram of the relative depths (di - dj ) empirically appears Laplacian, which strongly suggests that it is better modeled as one. Second, the Laplacian distribution has heavier tails, and is therefore more robust to outliers in the image features and error in the trainingset depthmaps (collected with a laser scanner; see Section 5.1). Third, the Gaussian model was generally unable to give depthmaps with sharp edges; in contrast, Laplacians tend to model sharp transitions/outliers better.

Obtaining a sparse (i.e. the smallest model) and a robust solution (to outliers) is the reason why he uses L1. Using L1 is also why Lawrence Carin, Shihao Ji and Ya Xue [2] use Laplace distributions in order to obtain a sparse solution in their Bayesian compressive sensing approach. The chapter in Inder Jeet Taneja's book is a nice reference to differential entropy and associated probability distributions that maximize it (in other words, given a certain type of information/assumptions on the problem what is the type of distribution that allows one to build a model with that knowledge and not more). In imaging, decomposing a scene with a dictionary of overcomplete bases using the compressed sensing approach has shown L0 to be well approximated by L1. So in effect, when one solves approximation problems with L1, many times, one is solving for L0, i.e. for the sparser solution out of the whole potentially overcomplete dictionary.

The "phase diagram" shown on the side marks the boundary between areas where L1 is like L0 and when L1 cannot converge to an L0 solution. [p is the number of variables. n is the number of observations and k is measure of sparsity of the underlying model] The x-axis is the level of underdeterminedness (close to 0 means very few observations compared to the number of variables) , and the y-axis is the level of sparsity of the underlying model (close to 0 means a very dense signal, close to 1 means a very sparse signal)

Make that six codes enabling the reconstruction of a signal using an overcomplete dictionnary. Philippe Jost at EPFL is making available the TREE BASED PURSUIT 2D toolbox. According to his abstract:

Tree-based pursuit efficiently trades off complexity and approximation performance for overcomplete signal expansions. Finding the sparsest representation of a signal using a redundant dictionary is, in general, a NP-Hard problem. Even sub-optimal algorithms such as Matching Pursuit remain highly complex. We propose a structuring strategy that can be applied to any redundant set of functions, and which basically groups similar atoms together. A measure of similarity based on coherence allows for representing a highly redundant sub-dictionary of atoms by a unique element, called molecule. When the clustering is applied recursively on atoms and then on molecules,it naturally leads to the creation of a tree structure. We then present a new pursuit algorithm that uses the structure created by clustering as a decision tree. This tree-based algorithm offers important complexity reduction with respect to Matching Pursuit, as it prunes important parts of the dictionary when traversing the tree. Recent results on incoherent dictionaries are extended to molecules, while the true highly redundant nature of the dictionary stays hidden by the tree structure.

The open presentation attracted a large crowd. During the presentation, I was intrigued by the statement by Andrew that Schlumberger was positioning itself to be a provider of service for Carbon burying technology. But when you think about it, it makes sense as they have devised many services and technologies that are needed for this type of undertaking.

The room was full of people who looked like they wanted to be hired and so it was difficult to have any of them ask questions at the very end of the talk. Pissing off the CEO of the company you want to join, is a very compelling argument to not talk or ask question, or so they believe.... So I ended up having to do the dirty deed, but I was in fact really interested in several answers.

I have mentioned Schlumberger in this blog a while back, it was because of their ability to get signals from 3000 meters underground by using pulsed mud telemetry in the process generally known as Logging While Drilling. The main point was that, in order to save about 200 to 300K$ per day, they had to gather data at the drilling post in real-time so that they could steer the drilling bit (yes, drilling bits can go horizontal). Some people at Sandia have devised a Disposable Fiber Optic Telemetry System but it does not seem to have gain any traction in that industry. Pulsed mud bit rate is equivalent to an astonishing 30 bits per second transmission rate last time I checked. My question to Andrew was: have you guys done better in the past few years ? and the answer looked like a big maybe. He mentioned a new technology that uses some type of radio transmitter between each of the drilling rods but it did not seem to be a system that was yet currently used in the field. The mud communication system is an amazing piece of inventivness and the communication aspect of it is one of the most interesting problem to work on. Because of the very harsh constraints on the system (pressure, temperature,...) I am barely surprised that there isn't a better solution but I also think they should think outside the box on this one. My take would probably include using compressed sensing so that the amount of power generated in the measuring bit can be decreased tremendously. Heat generation (by the computers/electronics of the measuring bit) is non-trivial as there is little in the way of cooling when producing heat in these depths (the soil surrounding the bit is already warmer than the inside). Because of the high temperature environment, one also has to develop some better electronics to deal with these high temperature environment (see Sandia's presentation on electronics development and the need for new technology (SOI))

I then asked a question about the Canadian tar pits and the use of technology such as heat pipe to transfer energy from geothermal wells all the way up to the tar pits in order to warm them up so that they become liquid (i.e. less viscous and therefore more enconomical to retrieve from the ground). The answer looked like there is already have a program called "HTPT" that looks at that. HT may mean high temperature but I am sure what PT stands for.

And then I asked the "forward looking" question: if you wanted to differentiate yourself from your competitors in the next two or three years, where would you put your money in ? The answer was interesting because I was not expecting it. The way I interpreted what he said was: Data fusion, how do you combine the large amount of data produced in the field to have a clearer picture of your oil field (not just in three dimensions but also including time). When I went to talk to each of the engineers present at the different booth after the presentation, it did not seem that they had a view of what that entailed. One of the reasons mentioned was that most customers were not willing to put money into this type of analysis and so the company did not have a specific research team dedicated to that. The company itself is known to be dealing with very large amount of data and making sense of them for their customers. Yet summarizing that knowledge seems to be a difficult undertaking that most customers are only willing to do in-house. I am sure that an enterprising person with views on this issue could help them out. There is no reason to believe that developments in dimensionality reduction in the past few years should not be considered for those gigantic datasets.

Data fusion is also some kind of buzzword, so it may be productive to define what that means. In the measuring bit, there are different kinds of instruments, including neutron generators, radiation detectors, NMR and electromagnetic. Some of the current work seems to have been able to correlate seismic and flow measurements in order to provide a better assessment of the borehole condition. Therefore, a data fusion scheme would be aimed at correlating all the measurements from several types of sensors in order to provide additional information about either the location of the measuring bit and the time dependent geological conditions around that bit.

In order to do that, one has to compare measurements with computations. One of current generic concern is the ability to do inversion with Monte-Carlo codes such as MCNP (This is a very difficult problem because the solving of this inverse problem requires several many runs of forward computation by MCNP) or faster but coarser deterministic methods. You have many different parameters that you change (sensitivity studies) in order to figure out the distribution of parameters for the situation of interest.Since MCNP or deterministic codes have many different parameters and are running in a finite time, one needs to have tools that provide a way of "interpolating" between parameters family you have not explored computationally. In the end, this problem is not unlike the problem faced in nuclear engineering when one runs a complex thermal hydraulics code: The Experimental Probabilistic Hypersurface tries to help in that respect.

Friday, May 11, 2007

HCI stands for Human-Computer Interaction and is the computer science field devoted to enable a better connection between computers and humans. David Knossow at INRIA just released his PhD thesis to the TEL server. It is in French but his thesis summary reads:

Markerless Human Motion Capture with Multiple Cameras My Ph.D manuscript deals with the problem of markerless human motion capture. We propose an approach that relies on the use of multiple cameras and that avoids most of the constraints on the environment and the use of markers to perform the motion capture, as it is generally the case for industrial systems. The absence of markers makes harder the problem of extracting relevant information from images but also to correlate this information between. Moreover, interpreting this extracted information in terms of joint parameters motion is not an easy task. We propose an approach that relies on occluding contours of the human body. We studied the link between motion parameters and the apparent motion of the edges in images. Minimizing the error between the extracted edges and the projection of the 3D model onto the images allows to estimate the motion parameters of the actor. Among the opened issues, we show that using video based motion capture allows to provide additional hints such as contacts between body parts or between the actor and its environment. This information is particularly relevant for improving character animation.

While the initial interest is in capturing human motion at low cost (as opposed to current systems which cost up to $400,000), I believe this is the beginning of technology development that is central to the study and detection of autism in infants (3-6 months old). The current state of the affairs with regards to Autism detection is that one waits until speech is shown to be very late (about 2 year old) to begin diagnosing the disease even though it has been shown that the brain growth has been abnormal from 0 to 2 as shown by Eric Courchesne. With the advent of the digital world, home movies are beginning to be records of the state of knowledge on the condition of people. Some studies have shown at the same time that home movies could be used to figure out very early that something is not right (search Pub Med with the keywords: Autism movies). For instance, in

Results indicated that 1-year-olds with autism spectrum disorder can be distinguished from 1-year-olds with typical development and those with mental retardation. The infants with autism spectrum disorder looked at others and oriented to their names less frequently than infants with mental retardation. The infants with autism spectrum disorder and those with mental retardation used gestures and looked to objects held by others less frequently and engaged in repetitive motor actions more frequently than typically developing infants.

[If the links to the toolbox does not work, you may want to look for another location in the comments section of this entry ]Laurens van der Maaten just released a large matlab toolbox featuring 25 dimensionality reduction techniques[Updated: Dec 1, 2008]. And then you can use PRTools to do your own classification. Be careful what you wish for :-)

I have had requests from several parties to give an overview on the type of obstacle avoidance scheme that might be most promising. Right now, we (Pegasus) are still evaluating some of these, so this entry should not be construed as part of our algorithm unveiling entries but rather a general overview we did a while back. It is important to realize that the main contribution of this entry is really about defining a hardware + software solution to the localization of cues that will be later learned/categorized. One large component of an obstacle avoidance solution is the machine learning/statistical device used to identify rapidly these cues as problematic for the autonomous vehicle or not. This is not unlike the human cortex (see reference section [1] [2] [3]).

In the case of the Urban Challenge, we are facing not only stopped obstacles but moving one as well. The moving obstacle have behaviors from which one needs to learn from as well. In other words, in the case of vision, a lot of work boils down to producing some amount of cues/features (a small number) from a very large set of data (pixels from an image). In some areas of computer science this is called dimensionality reduction.

Stereo-imaging:

The fly algorithm, a robust stereo algorithm using real time a genetic algorithm (yes, there is such thing as real time genetic algorithm!) and has been tried on cars. and specifically to has been used to avoid people and other objects. The initial thesis with the algorithm is in french. Improvement over the thesis have been focused on the car driving experience.

There are also numerous commercial solutions as listed by the folks at V2_lab's where they discuss each of them. I found this entry pretty revealing about the state of the affairs with regards to stereovision, you have to look at the comment section

For most stereo matching algorithms the Firewire cameras produce higher quality uncompressed images that do not wreak havoc on sensitive feature detectors. I use the Unibrain Fire-I board camera http://www.unibrain.com/index.html with the CMU 1394 Digital Camera API (Windows), which gives you very complete control and works with just about any Firewire camera, because they all use the same standard interface. http://www.cs.cmu.edu/~iwan/1394/ . When I read the technical reports for the 2005 DARPA Grand Challenge almost every report showed pictures of vehicles equiped with stereo pairs of cameras, but at the race just about all of them had been removed, presumably because of basic issues such as camera synchronization.

Monocular evaluation of the distance field. There are two approaches that caught our attention:

(With regards to Monocular information, we should not forget the excellent Mono-SLAM : This website has a Matlab implementation of the SLAM using monocular vision. There is a timely thesis on the subject here where it looks like using two cameras implementing both the monoslam algorithm.)

A Random Lens Imager, it is a hardware implementation of a totally new concept in data processing known as compressed sensing (don't ask anybody around you about it because it is too new). It needs only one camera but much of the work goes into the calibration.

Thursday, May 10, 2007

Sparsify is a set of Matlab m-files implementing a range of different algorithms to calculate sparse signal approximations. Currently sparsify contains two main sets of algorithms, greedy methods (collected under the name of GreedLab) and hard thresholding algorithms (collected in HardLab)

Monday, May 07, 2007

As a small team, we had to make choices that are reflected in our choice of hardware and algorithm. One of the most time intensive aspect of devising a robot is the development of an accurate motion model and then an adequate sensor/perceptual model [1].

We could not spend an entire team and the time to devise a motor model as well as a sensor model as what other teams have done. Some of it stems from the fact that we are not interested in getting M.S. theses written for the moment as it is the case here. Furthermore, our vehicle is far from being protected from the elements. And we considered that it would become difficult to rely on a model that could become less and less accurate as the car was getting older. Another aspect of our decision making came from the actuators we used. In the case of the steering wheel, there are moments when the steering motor can slip (especially in large deviations). This is owed to two factors: the laptop is busy doing other tasks can sometime send only the steering task with a delay or there is too much torque too be applied too fast. This results in a command control that would be difficult to model. Finally, we were interested in developing a robot that would learn it's own behavior while being supervised with very little labeled data. In other words, we want the learning to be based on data coming from an actual driving behavior. Not much data understanding should come from any subsequent data processing.

In the latter case, we decided against using the SDE approach but use Compressed Sensing instead for the dimensionality reduction aspect of the problem for building the sensor model. Some aspect of the reduction include some of the ideas in [7].

[3] [pdf][ps.gz]Subjective mapping, Michael Bowling, Dana Wilkinson, and Ali Ghodsi.In New Scientific and Technical Advances in Research (NECTAR) of the Proceedings of the Twenty-First National Conference on Artificial Intelligence (AAAI) , pages 1569--1572, 2006.

Friday, May 04, 2007

In a previous entry, I mentioned that we would be unveiling our algorithm. Before we do this, I will talk about the different sensors we have and use.

GPS: Since the race is mostly a mapping between the vehicles knowledge of its surrounding and a set of maps given by DARPA with GPS coordinates ( see Route Network Definition File (RNDF) and Mission Data File (MDF) ), we need to have a GPS. We do not have a DGPS, just a normal GPS chip with which we are interfacing with a an RS-232 interface at a rate of 1 Hz. To be specific this is a GARMIN GPS 18 PC, OEM SYSTEM, WAAS.

IMU: We also have a Microstrain MEMS IMU that provides acceleration, turning rate and heading (3DM-GX1). It provides information at 100 Hz.

Vision system: We use a series of webcams and a set of Unibrain firewire cameras. Right now our frame rate os about 15 Hz.

All these sensors interface with the Python program through an RS-232 channel. In terms of sophistication, it just does not get any better for us. One of the underlying reason is the realization that gathering data is one thing, using them efficiently is a totally different one. In particular, there are instances where reading and processing information from the IMU is not interesting.

With regards to the actuators, we currently have two large stepper motors connecting to the python program through the serial port. The first stepper motor rotates the steering wheel and the second one activates either the brake or the acceleration pedal.

One of the ways to do supervised learning, is to run the python program from a laptop that connects to both the sensors and the stepper motors. One can then run the car through the keyboard of the laptop. It works well as long as Skype is not running on the laptop at the time ( yes, we tried :-), it's a little bit like talking on your cell while driving....

In my next entry, I will discuss the modification to the webcams and firewire cameras so that they provide meaningful information. In particular, I will talk about the algorithm for the stereo system as well as the hardware and software implementation of the compressed sensing element of our algorithm (a random lens imager using a webcam). Both are in competition and we believe that the stereo system will not need to be used eventually.

Tuesday, May 01, 2007

The Technical Evaluation criteria lists the following items as part of what an autonomous vehicle involved in DARPA Urban Challenge race should be able to handle and how to handle it during the race and in some of the pre-qualifying stages:

A. Basic Navigation

A. 1Preparation for run

A.2 Mission start

A.3. Checkpoints

A.4. Stay in lane

A.5. Speed limits

A.6. Excess delay

A.7. Collisions

A.8. Stop line

A.9. Vehicle Separation

A.10. Leaving lane to pass

A.11. Returning to lane after pass

A.12. U-turn

B. Basic Traffic

B.1. Basic Navigation

B.2. Intersection precedence

B.3. Minimum following distance

B.4. Queueing

C. Advanced Navigation

C.1. Basic Traffic

C.2. Obstacle field

C.3. Parking lot

C.4. Dynamic re-planning

C.5. Road following

C.6. GPS outage

D. Advanced Traffic

D.1. Advanced Navigation

D.2. Merge

D.3. Vehicle separation during merge

D.4. Left turn

D.5. Vehicle separation during left turn

D.6. Zones

D.7. Emergency braking

D.8. Defensive driving

D.9. Traffic jam

The current rules are here. The site visit is scheduled to occur in San Antonio. The RNDF file describes the road to the vehicle. The MDF is the file that describes what is expected from the vehicle for its mission. There is no mention in the MDF of potential obstacles, traffic or GPS outage (as generally seen in cities with large buildings like Paris or New-York, a phenomenon called the canyon effect). The expectation is for the autonomous vehicle to follow the rules to get out these unexpected problems and accomplish its mission within the alloted time. DARPA decides on May 10th which team to invite at the Site visits.