Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the vanishing gradient difficulty in the optimization with binary activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This paper presents HashNet, a novel deep architecture for deep learning to hash by continuation method, which learns exactly binary hash codes from imbalanced similarity data where the number of similar pairs is much smaller than the number of dissimilar pairs. The key idea is to attack the vanishing gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks.

Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Supervised hashing, which improves the quality of hash coding by exploiting the semantic similarity on data pairs, has received increasing attention recently. For most existing supervised hashing methods for image retrieval, an image is first represented as a vector of hand-crafted or machine-learned features, followed by another separate quantization step that generates binary codes. However, suboptimal hash coding may be produced, because the quantization error is not statistically minimized and the feature representation is not optimally compatible with the binary coding. In this paper, we propose a novel Deep Hashing Network (DHN) architecture for supervised hashing, in which we jointly learn good image representation tailored to hash coding and formally control the quantization error. The DHN model constitutes four key components: (1) a sub-network with multiple convolution-pooling layers to capture image representations; (2) a fully-connected hashing layer to generate compact binary hash codes; (3) a pairwise cross-entropy loss layer for similarity-preserving learning; and (4) a pairwise quantization loss for controlling hashing quality. Extensive experiments on standard image retrieval datasets show the proposed DHN model yields substantial boosts over latest state-of-the-art hashing methods.

Deep convolutional networks are well-known for their high computational and memory demands. Given limited resources, how does one design a network that balances its size, training time, and prediction accuracy? A surprisingly effective approach to trade accuracy for size and speed is to simply reduce the number of channels in each convolutional layer by a fixed fraction and retrain the network. In many cases this leads to significantly smaller networks with only minimal changes to accuracy. In this paper, we take a step further by empirically examining a strategy for deactivating connections between filters in convolutional layers in a way that allows us to harvest savings both in run-time and memory for many network architectures. More specifically, we generalize 2D convolution to use a channel-wise sparse connection structure and show that this leads to significantly better results than the baseline approach for large networks including VGG and Inception V3.

Monday, February 27, 2017

Tonight we will have a special hors série (Hors serie #7 Season 4) on Machine Learning for Arts with Gene Kogan. To register, please go here. The event will be hosted and sponsored by our good friends at Mobiskill

This talk will examine the most recent wave of artistic, creative, and humorous projects applying machine learning in various domains, producing troves of machine-hallucinated text, images, sounds, and video, demonstrating an affinity to imitating human style and sensibility.
These experimental works attempt to show the capacity of these machines for producing aesthetically and culturally meaningful art, while also challenging them to illuminate their most obscure and counterintuitive properties.
Additionally, a series of in-progress educational materials by the speaker will be shown, including demos, code samples, artistic works, and explanatory writings about the topic.

This article studies the Gram random matrix model G=1TΣTΣ, Σ=σ(WX), classically found in random neural networks, where X=[x1,…,xT]∈Rp×T is a (data) matrix of bounded norm, W∈Rn×p is a matrix of independent zero-mean unit variance entries, and σ:R→R is a Lipschitz continuous (activation) function --- σ(WX) being understood entry-wise. We prove that, as n,p,T grow large at the same rate, the resolvent Q=(G+γIT)−1, for γ>0, has a similar behavior as that met in sample covariance matrix models, involving notably the moment Φ=TnE[G], which provides in passing a deterministic equivalent for the empirical spectral measure of G. This result, established by means of concentration of measure arguments, enables the estimation of the asymptotic performance of single-layer random neural networks. This in turn provides practical insights into the underlying mechanisms into play in random neural networks, entailing several unexpected consequences, as well as a fast practical means to tune the network hyperparameters.

Interactive learning is a modern machine learning paradigm of significant practical and theoretical interest, where the algorithm and the domain expert engage in a two-way dialog to facilitate more accurate learning from less data compared to the classical approach of passively observing labeled data. This workshop will explore several topics related to interactive learning broadly defined, including active learning, in which the learner chooses which examples it wants labeled; explanation-based learning, in which the human doesn't merely tell the machine whether its predictions are right or wrong, but provides reasons in a form that is meaningful to both parties; crowdsourcing, in which labels and other information are solicited from a gallery of amateurs; teaching and learning from demonstrations, in which a party that knows the concept being learned provides helpful examples or demonstrations; and connections and applications to recommender systems, automated tutoring and robotics. Key questions we will explore include what are the right learning models in each case, what are the demands on the learner and the human interlocutor, and what kinds of concepts and other structures can be learned. A main goal of the workshop is to foster connections between theory/algorithms and practice/applications.

Quantised random embeddings are an efficient dimensionality reduction technique which preserves the distances of low-complexity signals up to some controllable additive and multiplicative distortions. In this work, we instead focus on verifying when this technique preserves the separability of two disjoint closed convex sets, i.e., in a quantised view of the "rare eclipse problem" introduced by Bandeira et al. in 2014. This separability would ensure exact classification of signals in such sets from the signatures output by this non-linear dimensionality reduction. We here present a result relating the embedding's dimension, its quantiser resolution and the sets' separation, as well as some numerically testable conditions to illustrate it. Experimental evidence is then provided in the special case of two ℓ2-balls, tracing the phase transition curves that ensure these sets' separability in the embedded domain.