Inverted File System Structure

the key being the Sift descriptor of all known images, the value being an array of image entry.

An image entry consists of an image id and possibly other refinement data, e.g. hamming embedding[3], for improved matching accuracy.

K-mean algorithm：is used to cluster the descriptors to form code words （Each code word will then be associated to multiple images. Otherwise, if no clustering is done, the each codeword only associate one image only.）so the array of image id will have only one element.

To match a query descriptor to a codeword, a brute force is used to find the closest codeword.

A keypoint – in an image is usually an extrema point of convolving the image with a finite length filter：

Difference of Gaussian (DoG)

Laplacian of Gaussian (LoG)

a convolution is expensive

several downsampling is often required to calculate the filter response extrema over several image scales.

Motilal Agrawal et al. suggested using three integral images：

one square integral image,

two slanted integral images

and an octagon bi-level filter to approximate LoG calculation [2]. Keypionts are found by applying octagon filters of different sizes to an image and finding the locations of extrema. If the extrema is above a certain threshold, the location is considered as a keypoint.

An octagon shape is chosen as the filter’s shape because it is a close approximation to a circular LoG filter. Seven scales of the filter are applied to the image for scale invariance, with sizes set according to the original paper.

The Censure detector is used in place of the Sift detector to find keypoints.

The Sift descriptor is used to describe Censure keypoints.

The calculation of the Sift descriptor involves finding the statistics, i.e. binning, of the gradient of a small image patch in the vicinity of the keypiont.

The size of the image patch is determined by the size of the keypoint scale given by the Censure detector. Usually the size of the image patch is 1.5 to 3 times the size of the keypoint.

After computing the gradient magnitude of the image patch, the gradient magnitude is weighted by a Gaussian window to emphasize the importance of closer samples.

40 sample images are resized to 256×256 and then indexed by their Sift descriptors on the server machine and stored in an inverted file system (IFS) in RAM. The id of the closest matching image (from 1 to 40) is returned to the client.

For each image, four anchor points are added to the corners. When the client scans the image, the four corners will be used as reference points to do a perspective transform to correct the image to its upright position.

The four anchors of the image is tried to be found using template matching per-frame. The realign the image to 256×256. Then its Censure keypoints are localized and Sift descriptors calculated. These descriptors are then Base64 encoded and sent to the server using HTTP Post method.

The server tries to match the received descriptors to its IFS using brute force method. The id of the closest matching image (from 1 to 40) is returned to the client.

A grayscale image I is generated by a scene of piecewise smooth (multiply-connected) surfaces S and albedo反照率 , Nuisances are divided into those that are a group g (contrast transformations, local changes of viewpoint) and a non-invertible map (quantization, occlusions).

Deviations from this model (non-diffuse reflectance, mutual illumination, cast shadows, sensor noise) are not represented explicitly and lumped as an additive error n:

As abstract “visual recognition” tasks we consider classifications (detection, localization, categorization and recognition) that boil down to learning and evaluating the likelihood p(I|c) of a class c that affects the data via a Markov chain c ! ! I.