Thursday, December 21, 2017

Program / Streaming: The Future of Random Projections : a mini-workshop

Florent Krzakala and I are organizing a mini-workshop on The Future of Random Projections tomorrow.

Streaming will be live here:

Random Projections have proven useful in many areas ranging from signal processing to machine learning. In this informal mini-workshop, we aim to bring together researchers from different areas to discuss this exciting topic and its future use in the area of big data, heavy computations, and Deep Learning.

In this talk, I will discuss how advanced tools from random matrix theory allow to better understand and improve the large dimensional statistics of many standard machine learning methods, and in particular non-linear random feature maps. We will notably show that the performance of extreme learning machines (that can be seen as mere ridge-linear regression of non-linear RFMs) is easily understood, particularly so when the input data arise from a mixture model.

Learning parameters from voluminous data can be prohibitive in terms of memory and computational requirements. An increasingly popular approach is to first compress the database into a representation called a linear sketch, that satisfies all the mentioned requirements, then learn the desired information using only this sketch, which can be significantly faster than using the full data if the sketch is small.

In this talk, we introduce a generic methodology to fit a mixture of probability distributions on the data, using only a sketch of the database. The sketch is defined by combining two notions from the reproducing kernel literature, namely kernel mean embedding and Random Features expansions. It is seen to correspond to linear measurements of the underlying probability distribution of the data, and the estimation problem is analyzed under the lens of Compressive Sensing (CS). We extend CS results to our infinite-dimensional framework, give generic conditions for successful estimation and apply them analysis to many problems, with a focus on mixture models estimation. We base our method on the construction of random sketching operators, using kernel mean embeddings and random features, such that some Restricted Isometry Property (RIP) condition holds in the Banach space of finite signed measures, with high probability, for a number of random features that only depends on the complexity of the problem. We also describe a flexible heuristic greedy algorithm to estimate mixture models from a sketch, and apply it on synthetic and real data.

In this work, we revisit fast dimension reduction approaches, as with random projections and random sampling. Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis. Such dimension reduction can be very efficient when the signals of interest have a strong structure, such as with images. We focus on this setting and investigate feature clustering schemes for data reductions that capture this structure. An impediment to fast dimension reduction is that good clustering comes with large algorithmic costs. We address it by contributing a linear-time agglomerative clustering scheme, Recursive Nearest Agglomeration (ReNA). Unlike existing fast agglomerative schemes, it avoids the creation of giant clusters. We empirically validate that it approximates the data as well as traditional variance-minimizing clustering schemes that have a quadratic complexity. In addition, we analyze signal approximation with feature clustering and show that it can remove noise, improving subsequent analysis steps. As a consequence, data reduction by clustering features with ReNA yields very fast and accurate models, enabling to process large datasets on budget. Our theoretical analysis is backed by extensive experiments on publicly-available data that illustrate the computation efficiency and the denoising properties of the resulting dimension reduction scheme.

We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns. Learned factors may be sparse or dense and/or nonnegative, which makes our algorithm suitable for dictionary learning, sparse component analysis, and nonnegative matrix factorization. Our algorithm streams matrix columns while subsampling them to iteratively learn the matrix factors. At each iteration, the row dimension of a new sample is reduced by subsampling, resulting in lower time complexity compared to a simple streaming algorithm. Our method comes with convergence guarantees to reach a stationary point of the matrix-factorization problem. We demonstrate its efficiency on massive functional magnetic resonance imaging data (2 TB), and on patches extracted from hyperspectral images (103 GB). For both problems, which involve different penalties on rows and columns, we obtain significant speed-ups compared to state-of-the-art algorithms.