Tools

Augmented Reality on Mobile Internet Devices

How Atom-based mobile Internet devices can access and provide information to users in a different manner than traditional web-based information sources

Experimental Results Extension

In this section, we look at how matching can be enhanced by removing duplicate matches, by analyzing the histograms of distances, and by adding images to the landmark's database.

Duplicates Removal. To test the impact of duplicate matches' removal, we ran both the original and the new matching algorithms (see "Image Matching Algorithm" and "Matching Enhancement by Duplicates Removal") on our sample databases for various SURF thresholds. In part (a) of Figure 4, we display the average of the top 1 precision (the precision of the top matched image), top 5 precision (the precision of the top 5 matched images), and the average precision over the 10 databases. Clearly, removing the duplicate matches improves the performance.

Matching Enhancement through Histograms Distances. In part (b) of Figure 4, we display the top 10 retrieved images for the Chiang Kai Shek Memorial Hall in Taiwan and the Capitol in Washington DC from the database mentioned in the section "Building a More Realistic Database" by the updated matching algorithm with duplicates removal. We also used the histograms of distances to further refine the match, as explained previously. Based on the histograms analysis, images labeled with a red square in the figure are rejected and those tagged with a blue square are retained. From these experiments, we see that portraits and statues of people can be clearly distinguished and rejected, since most of the queries are those of buildings. Such images in general have a very symmetric histogram. They also may have a large mean. We observed similar results for the other queries mentioned previously also.

Matching Enhancement through Database Extension. For each of our 10 landmarks, we download the top 21 images returned by Google image search when queried with the title of the landmark Wikipedia page, and we add them to the corresponding landmark database. Next, we apply our matching algorithm to the improved results in part (c) of Figure 4. Obviously, the top 1 and top 5 precision values increase for almost all the 10 databases mentioned in the section entitled "Building a More Realistic Database."

[Click image to view at full size]

Figure 4: (a) Top 1, Top 5, and Average Precision Results for the 10 Landmarks vs. the SURF Threshold. (b) Matching Enhancement Based on Histograms of Distances. Images Before the Red Line Are the Queries. (c) Top 1 and Top 5 Precision without and with Google Database Extension for the 10 Landmarks (Source: Intel Corporation, 2010)

Implementation and Optimization

In this section, we discuss how to optimize the algorithms to meet reasonable response time on the MID and how to take advantage of the features of the Intel Atom processor.

Feature Extraction and Image Match. As we have described in the previous sections, image feature extraction and image match (if the precached data were available on the MID) are performed on the MID client device. To perform these tasks, the MID requires significant computing power and memory storage.

Software Optimization. Software optimization includes the following components.

Image feature extraction. The original SURF-based image feature extraction code is based on the OPENCV implementation as described earlier in this article. We identified two hotspots after using the VTune analyzer: keypoint detection and keypoint descriptor generation. We applied multiple optimization techniques to these hotspots to speed up the image feature extraction. We multi-threaded keypoint detection and keypoint description generation by using OPENMP, and we achieved 1.6X speedup when compared with the single-thread version on an Atom processor. Converting keypoint detection from floating-point to integer arithmetic provided an additional 15% speedup. We also quantized a keypoint descriptor from float (32 bit) to char (8 bit) that resulted in a 4X reduction in the data storage requirements. Performance was improved by taking advantage of the integer operations without significantly degrading the quality of the results.

Image match. We again used the VTune analyzer and identified distance calculations as the hotspot of image match. We multi-threaded keypoint detection and keypoint description generation by using OPENMP. We achieved a 1.7X speedup when compared with the single-threaded version on an Intel Atom processor. We also vectorized the distance calculation by using SSE intrinsics to take advantage of 4-way SIMD vector units in the Intel Atom processor, which provided a 2X speedup over the nonvectorized image match codes.

Performance on a Platform Based on the Atom Processor

We analyzed the software implementation on a single core, hyper-threaded Intel Atom system (800MHz, 256MB RAM 512KB L2 cache), running a Linux. Our performance analysis is conducted on four datasets with different sizes and resolutions: 10 QVGA images, 10 VGA images, 100 QVGA images, and 100 VGA images. We chose these datasets for two reasons: a) most of the current MID devices take QVGA or VGA video input; b) for a given GPS location, the visible landmarks normally range from 10 to 100. Hence, with the help of GPS localization, we need to pre?cache 10-100 landmarks for database comparison. Figure 5(a) shows the total runtime of datasets to compare performance by using pair-wise match versus FLANN indexing. Figure 5(b) lists the runtime breakdown for each component including keypoint detection, descriptor generation, and image match. For the VGA resolution, the number of keypoints per image is around 800; and for the QVGA resolution, the number of keypoints per image is around 350. From the figures, it is clear that pair-wise matching runtime increases linearly with the database size. With FLANN indexing, the runtime scales very well when the database size increases. When a query image needs to be compared against a database size larger than 10 (which is a common case), we should consider using FLANN indexing instead of pair-wise matching to get a faster response. Overall, the execution time is about one second for querying a VGA image from a 100 VGA image database.

A reduced linear system with gradients from only 200 pixels in the image instead of from all the pixels in the images

SSE instructions for the pyramid construction and the linear system solving

And only the coarsest levels of the pyramid to estimate the alignment.

Performance was measured on the same Intel Atom system by using VGA (640x480) input video and different options. The results are shown in Figure 6, which displays the measured frames per second (fps) for different models (displacement/camera rotation), estimation method (robust/non-robust) and resolution levels and iterations per level used. For pure displacement models, using non-robust estimation and running five iterations in Levels 3 and 4 of the multi-resolution pyramid (Level 1 being the original resolution) the performance is over 80 fps.

[Click image to view at full size]

Figure 6: Performance Measured in Frames per Second (fps) of the Image?based Stabilization Method on an Intel Atom System (Source: Intel Corporation, 2010)

Conclusion

In [8], we presented MAR with a fully functional prototype on a MID device with an Intel Atom processor inside. In this article, we described new improvements to the matching and tracking algorithms in addition to the design of the system and its database. We also presented the code optimization benchmark results for the Intel Atom processor. With all these improvements, MAR demonstrates the powerful capabilities of future mobile devices derived from location sensors, network connectivity, and computational power.

Acknowledgments

The authors would like to thank Igor Kozintsev, Oscar Nestares, and Horst Haussecker for their significant contributions to this work.

Maha El Choubassi and Yi Wu are Senior Research Scientists at the Vision and Image Processing Research group/FTR at Intel labs.

This article and more on similar subjects may be found in the Intel Technology Journal, Volume 14, Issue 1, "Essential Computing: Simplifying And Enriching Our Work And Daily Life". More information can be found at http://intel.com/technology/itj.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!