Approximate Nearest Neighbors (ANN) in high dimensional vector spaces is a fundamental, yet challenging
problem in many areas of computer science, including computer vision, data mining and robotics. In this work,
we investigate this problem from the perspective of compressive sensing, especially the dictionary learning aspect.
High dimensional feature vectors are seldom seen to be sparse in the feature domain; examples include, but not
limited to Scale Invariant Feature Transform (SIFT) descriptors, Histogram Of Gradients, Shape Contexts, etc.
Compressive sensing advocates that if a given vector has a dense support in a feature space, then there should
exist an alternative high dimensional subspace where the features are sparse. This idea is leveraged by dictionary
learning techniques through learning an overcomplete projection from the feature space so that the vectors are
sparse in the new space. The learned dictionary aids in refining the search for the nearest neighbors to a query
feature vector into the most likely subspace combination indexed by its non-zero active basis elements. Since
the size of the dictionary is generally very large, distinct feature vectors are most likely to have distinct non-zero
basis. Utilizing this observation, we propose a novel representation of the feature vectors as tuples of non-zero
dictionary indices, which then reduces the ANN search problem into hashing the tuples to an index table; thereby
dramatically improving the speed of the search. A drawback of this naive approach is that it is very sensitive
to feature perturbations. This can be due to two possibilities: (i) the feature vectors are corrupted by noise,
(ii) the true data vectors undergo perturbations themselves. Existing dictionary learning methods address the
first possibility. In this work we investigate the second possibility and approach it from a robust optimization
perspective. This boils down to the problem of learning a dictionary robust to feature perturbations, viz. paving
the way for a novel Robust Dictionary Learning (RDL) framework. In addition to the above model, we also
propose a novel LASSO based multi-regularization hashing algorithm which utilizes the consistency properties of
the non-zero active basis for increasing values of the regularization weights. Even though our algorithm is generic
and has wide coverage in different areas of scientific computing, the experiments in the current work are mainly
focused towards improving the speed and accuracy of ANN for SIFT descriptors, which are high-dimensional
(128D) and are one of the most widely used interest point detectors in computer vision. Preliminary results from
SIFT datasets show that our algorithm is far superior to the state-of-the-art techniques in ANN.