Abstract

The present paper is concerned with the problem
of robust pose estimation for planar targets in the context of real-time mobile vision. For robust recognition of targets at very low computational costs, we employ feature-based methods which are based on local binary descriptors allowing fast feature matching at run time. The matching set is then fed to a robust parameter estimation algorithm to obtain a reliable estimate of homography. The robust estimation of model parameters, which in our case is a 2D homographic transformation, constitutes an essential part of the whole recognition process. We present a highly optimized and device-friendly implementation of homography estimation through a unified hypothesize-and-verify framework. This framework is specifically designed to meet the growing demand for fast and robust estimation on power-constrained platforms. The focus of the approach described in this paper is not only on developing fast algorithms for the recognition framework, but also on the optimized implementation of such algorithms by accounting for the computing capacity of modern CPUs. The experimentations show that the resulting homography estimation implementation proposed in this paper brings a speedup of 25\(\times\) over the regular OpenCV RANSAC homography estimation function.

Keywords

Appendix: Homography estimation using Gaussian elimination

This appendix shows precisely the row operations we use for reduction to reduced-row-echelon form of the matrix shown in Eq. 4. The Gaussian elimination is done in two parts. In the first, identical row operations are applied to the top and bottom halves of the matrix, while in the second, row operations are applied to the whole matrix.

Starting from the above, we subtract rows 2 and 6 from the rows 0, 1, 3 and 4, 5, 7, respectively, thus eliminating almost all 1s in columns 2 and 5. Since we choose not to scale the rows containing the said 1s, they will remain unaffected throughout the remainder of the computation and therefore no storage needs to be reserved for them.

We note here that at this stage, of the 72 potential floating-point values in the matrix, only 32 (excluding the two remaining 1s) are distinct and non-zero. This neatly fits in half of a vector register file with 16 four-lane registers, a common configuration in most modern architectures.

For brevity, after this point only the row operations are given. They were designed to delay the use of reciprocals as long as possible. The first part is duplicated on both the top and bottom half.

which concludes the first part. We now cease treating the matrix as two independent \(4 \times 9\) halves and now consider the rightmost three columns as one \(8 \times 3\) matrix for the second part. We use the barren rows 3 and 7 to eliminate columns 6 and 7 as follows: first, we normalize row 7.