Just how much high performance computing do we need for image processing?

Heidelberg 22 June 2001Professor of Computer Science at the University of Freiburg, Dr. Hans Burkhardt addressed the topic of high performance systems applied for serial image processing and database searching at SC 2001, the annual Supercomputing Conference, held in Germany. Especially the 3D data sets acquired from tomography scans and visual screenings in biology constitute an enormous challenge for real time visualisation and analysis. Two areas in particular were highlighted by the speaker who introduced some remarkable examples to the audience with respect to image database queries and visual navigation in natural environments. In comparison with human visualisation performance, the present state-of-the-art in hardware is still low-end peanuts, barely able to deliver an advanced search on the semantic level. Dr. Burkhardt predicted that only within a hundred years from now, a system surpassing teraflop performance might match the power of the human eye.

Advertisement

Philosophers in ancient China already knew that a picture can tell a thousand words. Is this the reason why we produce far more visual data than we are able to interpret? Visual information emerges from a variety of sources, such as satellite images, x-rays, CT and MRI tomography scans, biological and chemical structure volume data sets. However, there exists a real time gap between the data and their interpretation since so many details remain invisible within the image processing. Data rates for communication between keyboard and computer attain speeds of no more than 10 bytes per second whereas the human eye is able to absorb 10 gigabytes per second, a multiplication by three times factor 1000.

Dr. Burkhardt explained how a simple task such as the correlation between 2 images already takes 11,5 days when performed in direct implementation. With fast algorithms and low complexity, you can speed this up to twenty seconds and using parallel computation, up to twenty micro-seconds. The hardware selection depends on the type of task which has to be performed. Data-dependencies demand another type of power for pattern recognition relating to feature extraction and interpretation than for pre-processing. In image classification, the system has to deal with an unstructured data tree. Power requirements for typical image processing problems amount to 10 Mflops/s for amplitude equalisation; 250 Mflops/s for local filtering; 2,5 Tflops/s for discrete Fourier transform; and 200 Mflops/s for Fast Fourier transform.

Queries to image databases can be time-consuming and difficult. Therefore, a team at the University of Freiburg has developed a search engine which allows to select images by appearance (SIMBA) in a reference database which contains 2350 images in MPEG 7 format. Dr. Burkhardt showed how a query based on an example image does not necessarily have to come up with a set of images displaying the same appearance as the original image. Geometric transformations can occur by rotating the images, divided into classes. The system is not interested in the parameters but takes into account the vector spaces to show different types of appearances.

In fact, there are three methods to calculate the image invariants, according to Dr. Burkhardt. The first one consists in integration with the mathematical group action. All different appearances then have the same representation. The pixel-based correlation gives large differences in the images and is also insensitive to size variations. The second approach functions with partial differential equations and the third one, called normalisation, uses extremal points on the orbits in the vector space. The texture neighbourhood goes into the search as well. Therefore, SIMBA results were better than performances by an IBM search machine which looked for the colour histogram in images.

Some excellent outcomes were achieved with the MICHELphilascope stamp collection database. SIMBA's advanced user interface was able to retrieve all the stamps belonging to the same series of Grimm tale, pop music artist, and Berlin's Brandenburg Gate representations. Another test involved a pollen classification project set up by the Swiss and German weather service. Some 800 pollens with very similar structures were scanned and transformed into 3D data volume sets under confocal laser processing. Since there is no position in the orbital space, the feature vector stays constant. SIMBA was able to provide an accuracy of indexing amounting to no less than 97,4 percent.

Next, the speaker illustrated the concept of 3D navigation in natural scenes with the MOVIS project in which electronic glasses for blind people are designed. Street scenarios with simple objects are generated to guide the vision-impaired individual using a system which tells what you see in the street, for instance a phone booth or a zebra path. Other initiatives include the interpretation of traffic scenes for driver support and autonomous driving and navigation in vehicles as well as navigation for clinical purposes based on endoscopic images for computer-assisted minimal invasive surgery. In all these cases, pattern recognition in images is performed with fast processors for simple objects. Dr. Burkhardt stressed that learning strategies for image and volume data sets will remain an permanent challenge for the next ten years to come but today's functionality is at least a good start.