Friday, September 25, 2015

Metric Learning: Random Projections and Deep Learning Style

In this work, we study distance metric learning (DML) for high dimensional
data. A typical approach for DML with high dimensional data is to perform the
dimensionality reduction first before learning the distance metric. The main
shortcoming of this approach is that it may result in a suboptimal solution due
to the subspace removed by the dimensionality reduction method. In this work,
we present a dual random projection frame for DML with high dimensional data
that explicitly addresses the limitation of dimensionality reduction for DML.
The key idea is to first project all the data points into a low dimensional
space by random projection, and compute the dual variables using the projected
vectors. It then reconstructs the distance metric in the original space using
the estimated dual variables. The proposed method, on one hand, enjoys the
light computation of random projection, and on the other hand, alleviates the
limitation of most dimensionality reduction methods. We verify both empirically
and theoretically the effectiveness of the proposed algorithm for high
dimensional DML.

Many recent efforts have been devoted to designing sophisticated deep
learning structures, obtaining revolutionary results on benchmark datasets. The
success of these deep learning methods mostly relies on an enormous volume of
labeled training samples to learn a huge number of parameters in a network;
therefore, understanding the generalization ability of a learned deep network
cannot be overlooked, especially when restricted to a small training set, which
is the case for many applications. In this paper, we propose a novel deep
learning objective formulation that unifies both the classification and metric
learning criteria. We then introduce a geometry-aware deep transform to enable
a non-linear discriminative and robust feature transform, which shows
competitive performance on small training sets for both synthetic and
real-world data. We further support the proposed framework with a formal
$(K,\epsilon)$-robustness analysis.