I am working on a pattern recognition problem to find two most similar rectangular regions in two given images. Specifically, I have two 2D (gray) images of the same sizes $I_A$ and $I_B$. Denote an arbitrary rectangular region in an image $I$ as $R^I(x_0,y_0,w,h)$, where $(x_0,y_0)$ denotes the upper left pixel of the rectangular of width $w$ and height $h$ and
$\forall i\in[0,w-1] \,\textrm{and} j\in [0,h-1],$,we have $ R^I(x_0,y_0,w,h)[i,j] = I[x_0+i,y_0+j]$.

My question is to find two rectangular regions of the same size, one in each image, (say they are $R^{I_A}(x^A_0,y^A_0,w,h)$ and $R^{I_B}(x^B_0,y^B_0,w,h)$) such that they are most similar in terms of the mean square error of theses two rectangular regions. Note it is possible to have $(x^A_0,y^A_0)\neq(x^B_0,y^B_0)$.

Ideally, I want to answer this question for all possible $w,h$ combinations. It is very costly to compute even for a given pair of $w$ and $h$. So far, I adopt the integral image technique. However, it still requires shifting image pixels. I wonder whether there is some better technique. Can anyone help?

2 Answers
2

I'm afraid, but the problem can only be solved by brute force (Calculate the correlation for all x0, y0, W, H). So it's very important to do the calculation fast. There is classical article - J. P. Lewis "Fast Normalized Cross-Correlation"
Search for citation to this article too.

[Edit] You can achieve additional speed up by using some multiresolution techniques (image pyramid), but it is depend of nature of your images.

Is the image pyramid technique basically looking at the image at different resolutions (via wavelet transforms) and doing the correlation there?
–
MohammadNov 14 '12 at 14:54

@geometrikal Template is a rectangle from first image and you search it on second image. I'm afraid, but the problem can only be solved by brute force (Calculate the correlation for all x0, y0, W, H). So it's very important to do the calculation fast.
–
SergVNov 14 '12 at 15:03

@Mohammad Citate from en.wikipedia.org/wiki/Template_matching#Speeding_up_the_Process - "Another way of speeding up the matching process is through the use of an image pyramid. This is a series of images, at different scales, which are formed by repeatedly filtering and subsampling the original image in order to generate a sequence of reduced resolution images.[9] These lower resolution images can then be searched for the template (with a similarly reduced resolution), in order to yield possible start positions for searching at the larger scales..."
–
SergVNov 14 '12 at 15:06

As stated already, you can not do much to speed up a comparison between any two chosen patches. What you need to focus on in order to speed up the process is how to reduce the number of patch pairs you need to compare.

If the computational process is as expensive as I presume it is, in addition to already mentioned image pyramid, I might have another suggestion.

Extract local features and perform feature matching. This is the way to usually recognize similar images, but it works for finding objects in whole images, which means it should also match features in smaller, locally similar patches.

These guys: Sivic, Zisserman: Video Google: A text retrieval approach to object matching in videos had a good idea about spatial consistency. In short, they make sure that the groups of matches have a fairly similar geometrical layout. They use a fairly loose criterion but suggest a way to both loosen and strengthen the criterion. You do have to do a straightforward brute force matching between the feature descriptors, but that still seems far less computationally expensive than directly calculating similarities between patches.

This way, you will hopefully get clusters of matches. Then you could limit the search only to the patches that contain a certain minimal number of matches. In my opinion, these patches would have a substantially higher probability of being the most similar. This is a bit complex add on to make, but I think it could be worth a shot.

Thanks for the suggestion and link. I think that any acceleration of the computing process is dependent on the nature of the image. It will be very good for normal photos, but there are some images for which these methods (the pyramid image, and your offer) does not work. Therefore, it is interesting to look at the images used by the author of questions.
–
SergVNov 16 '12 at 2:17

@SergV of course, the quality of the features extracted depends on the type of images, for low-texture images no good features usually can be found. It would be nice if you expanded a little on the image pyramid in your answer -- a sentence or two of explanation instead of just link/name
–
penelopeNov 16 '12 at 8:55