Data
is continuously being generated from sources such as machines, network
traffic, application logs, etc. Timely and accurate detection of
anomalies in massive data streams have important applications such as in
preventing machine failures, intrusion detection, and dynamic load
balancing. In this paper, we introduce a novel anomaly detection
algorithm, which can detect anomalies in a streaming fashion by making
only one pass over the data while utilizing limited storage. The
algorithm uses ideas from matrix sketching and randomized low-rank
matrix approximations to maintain, in a streaming model, a set of few
orthogonal vectors that form a good approximate basis for the data.
Using this constructed orthogonal basis, anomalies in new incoming data
are detected based on a simple reconstruction error test. We
theoretically prove that our algorithm compares favorably with an
offline approach based on global singular value decomposition updates.
The experimental results show the effectiveness and efficiency of our
approach over other popular fast anomaly detection methods.

Massive data streams are continuously being generated from sources such as social media, broadcast news, etc., and typically these datapoints lie in high-dimensional spaces (such as the vocabulary space of a language). Timely and accurate feature subset selection in these massive data streams has important applications in model interpretation, computational/storage cost reduction, and generalization enhancement. In this paper, we introduce a novel unsupervised feature selection approach on data streams that selects important features by making only one pass over the data while utilizing limited storage. The proposed algorithm uses ideas from matrix sketching to e ciently maintain a low-rank approximation of the observed data and applies regularized regression on this approximation to identify the important features. We theoretically prove that our algorithm is close to an expensive offline approach based on global singular value decompositions. The experimental results on a variety of text and image datasets demonstrate the excellent ability of our approach to identify important features even in presence of concept drifts and also its efficiency over other popular scalable feature selection algorithms.

Friday, August 28, 2015

We have mentioned homomorphic encryption here on Nuit Blanche mostly because of Andrew McGregor et al's work on the subject (see references below). Today, we have a Machine Learning approach using this encoding strategy, which in effect is not really that far from the idea of homomorphic sketches or random projections for low dimensional manifolds. Without further ado:

Recent advances in cryptography promise to enable secure statistical
computation on encrypted data, whereby a limited set of operations can be
carried out without the need to first decrypt. We review these homomorphic
encryption schemes in a manner accessible to statisticians and machine
learners, focusing on pertinent limitations inherent in the current state of
the art. These limitations restrict the kind of statistics and machine learning
algorithms which can be implemented and we review those which have been
successfully applied in the literature. Finally, we document a high performance
R package implementing a recent homomorphic scheme in a general framework.

We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.

while assembling the genome is important, with cheap and fast long reads, the goalpost is now slowly moving to the unsupervised learning of groups of genomes. That type of unsupervised learning can only be enabled with the right dimensionality reduction technique, today it is MinHash

Let Φ∈Rm×n be a sparse Johnson-Lindenstrauss
transform [KN14] with s non-zeroes per column. For a subset T of the unit
sphere, ε∈(0,1/2) given, we study settings for m,s required to
ensure

EΦsupx∈T∣∣∥Φx∥22−1∣∣<ε,

i.e. so that Φ preserves the norm of every
x∈T simultaneously and multiplicatively up to 1+ε. We
introduce a new complexity parameter, which depends on the geometry of T, and
show that it suffices to choose s and m such that this parameter is small.
Our result is a sparse analog of Gordon's theorem, which was concerned with a
dense Φ having i.i.d. Gaussian entries. We qualitatively unify several
results related to the Johnson-Lindenstrauss lemma, subspace embeddings, and
Fourier-based restricted isometries. Our work also implies new results in using
the sparse Johnson-Lindenstrauss transform in numerical linear algebra,
classical and model-based compressed sensing, manifold learning, and
constrained least squares problems such as the Lasso.

Hyperspectral images (HSI) contains extremely rich spectral and spatial
information that offers great potential to discriminate between various
land cover classes. The inherent high dimensionality and insufficient
training samples in such images introduces Hughes phenomenon. In order
to deal with this issue, several preprocessing techniques have been
integrated in processing chain of HSI prior to classification.
Supervised feature extraction is one such method which mitigates the
curse of dimensionality induced by Hughes effect. In recent years, new
strategies for feature extraction based on scattering transform and
Random Kitchen Sink have been introduced, which can be used in context
of hyperspectral image classification. This paper presents a comparative
analysis of scattering and random features in hyperspectral image
classification. The classification is performed using simple linear
classifier such as Regularized Least Square (RLS) accessed through Grand
Unified Regularized Least Squares (GURLS) library. The proposed
approach is tested on two standard hyperspectral datasets namely,
Salinas-A and Indian Pines subset scene captured by NASAs AVIRIS sensor
(Airborne Visible Infrared Imaging Spectrometer). In order to show the
effectiveness of proposed method, a comparative analysis is performed
based on feature dimension, classification accuracy measures and
computational time. From the comparative assessment, it is evident that
classification using random features achieve excellent classification
results with less computation time when compared with raw pixels(without
feature extraction) and scattering features for both the datasets.

Graphical models use the intuitive and well-studied methods of graph theory
to implicitly represent dependencies between variables in large systems. They
can model the global behaviour of a complex system by specifying only local
factors. This thesis studies inference in discrete graphical models from an
algebraic perspective and the ways inference can be used to express and
approximate NP-hard combinatorial problems.
We investigate the complexity and reducibility of various inference problems,
in part by organizing them in an inference hierarchy. We then investigate
tractable approximations for a subset of these problems using distributive law
in the form of message passing. The quality of the resulting message passing
procedure, called Belief Propagation (BP), depends on the influence of loops in
the graphical model. We contribute to three classes of approximations that
improve BP for loopy graphs A) loop correction techniques; B) survey
propagation, another message passing technique that surpasses BP in some
settings; and C) hybrid methods that interpolate between deterministic message
passing and Markov Chain Monte Carlo inference.
We then review the existing message passing solutions and provide novel
graphical models and inference techniques for combinatorial problems under
three broad classes: A) constraint satisfaction problems such as
satisfiability, coloring, packing, set / clique-cover and dominating /
independent set and their optimization counterparts; B) clustering problems
such as hierarchical clustering, K-median, K-clustering, K-center and
modularity optimization; C) problems over permutations including assignment,
graph morphisms and alignment, finding symmetries and traveling salesman
problem. In many cases we show that message passing is able to find solutions
that are either near optimal or favourably compare with today's
state-of-the-art approaches.

Tuesday, August 25, 2015

Much like last week's "Deep Learning Approach to Structured Signal Recovery", we are beginning to see some folks who use deep learning as a way of crafting reconstruction solvers. In the case presented below, I am not sure they are solving a true MMV problem as they seem to solve an even tougher problem (elements have similar sparsity as opposed to exactly the same sparsity pattern). Way to go !

We address the problem of compressed sensing with Multiple Measurement
Vectors (MMVs) when the structure of sparse vectors in different channels
depend on each other. "The sparse vectors are not necessarily joint sparse". We
capture this dependency by computing the conditional probability of each entry
of each sparse vector to be non-zero given "residuals" of all previous sparse
vectors. To compute these probabilities, we propose to use Long Short-Term
Memory (LSTM) [1], a bottom up data driven model for sequence modelling. To
compute model parameters we minimize a cross entropy cost function. We propose
a greedy solver that uses above probabilities at the decoder. By performing
extensive experiments on two real world datasets, we show that the proposed
method significantly outperforms general MMV solver Simultaneous Orthogonal
Matching Pursuit (SOMP) and model based Bayesian methods including Multitask
Compressive Sensing [2] and Sparse Bayesian Learning for Temporally Correlated
Sources [3]. Nevertheless, we emphasize that the proposed method is a data
driven method where availability of training data is important. However, in
many applications, train data is indeed available, e.g., recorded images or
video.

Monday, August 24, 2015

Back in 1996, Sparse coding made a big splash in Science and Engineering because the elements of a dictionary looked very much like the wavelet functions that had been discovered a few years earlier. For the first time, there was a sense that an algorithm could produce some simple insight on how the machinery of the visual cortex.

All that is well, but yesterday, Roelof mentioned that applying random noise on a particular GoogleLeNet layer (2012) produced
something else than dogs. Here are a few examples of what he calls
"Limited deep dreaming" that starts with random noise activating single
units from a particular layer of GoogLeNet (3a output) -all the other
images are listed in this Flickr album -

They certainly look very natural to me: Sometimes they look like structures found in electron microscopy, sometimes, they look like the structures found in numerical simulation of our universe:

Sparsity in the Fourier domain is an important property that enables the dense reconstruction of signals, such as 4D light fields, from a small set of samples. The sparsity of natural spectra is often derived from continuous arguments, but reconstruction algorithms typically work in the discrete Fourier domain. These algorithms usually assume that sparsity derived from continuous principles will hold under discrete sampling. This paper makes the critical observation that sparsity is much greater in the continuous Fourier spectrum than in the discrete spectrum. This difference is caused by a windowing effect. When we sample a signal over a finite window, we convolve its spectrum by an infinite sinc, which destroys much of the sparsity that was in the continuous domain. Based on this observation, we propose an approach to reconstruction that optimizes for sparsity in the continuous Fourier spectrum. We describe the theory behind our approach and discuss how it can be used to reduce sampling requirements and improve reconstruction quality. Finally, we demonstrate the power of our approach by showing how it can be applied to the task of recovering non- Lambertian light fields from a small number of 1D viewpoint trajectories.

We introduce the C++ application and R package ranger. The software is a fast
implementation of random forests for high dimensional data. Ensembles of
classification, regression and survival trees are supported. We describe the
implementation, provide examples, validate the package with a reference
implementation, and compare runtime and memory usage with other
implementations. The new software proves to scale best with the number of
features, samples, trees, and features tried for splitting. Finally, we show
that ranger is the fastest and most memory efficient implementation of random
forests to analyze data on the scale of a genome-wide association study.

Group testing tackles the problem of identifying a population of $K$
defective items from a set of $n$ items by pooling groups of items efficiently
in order to cut down the number of tests needed. The result of a test for a
group of items is positive if any of the items in the group is defective and
negative otherwise. The goal is to judiciously group subsets of items such that
defective items can be reliably recovered using the minimum number of tests,
while also having a low-complexity decoding procedure.
We describe SAFFRON (Sparse-grAph codes Framework For gROup testiNg), a
non-adaptive group testing paradigm that recovers at least a
$(1-\epsilon)$-fraction (for any arbitrarily small $\epsilon > 0$) of $K$
defective items with high probability with $m=6C(\epsilon)K\log_2{n}$ tests,
where $C(\epsilon)$ is a precisely characterized constant that depends only on
$\epsilon$. For instance, it can provably recover at least $(1-10^{-6})K$
defective items with $m \simeq 68 K \log_2{n}$ tests. The computational
complexity of the decoding algorithm of SAFFRON is $\mathcal{O}(K\log n)$,
which is order-optimal. Further, we robustify SAFFRON such that it can reliably
recover the set of $K$ defective items even in the presence of erroneous or
noisy test results. We also propose Singleton-Only-SAFFRON, a variant of
SAFFRON, that recovers all the $K$ defective items with $m=2e(1+\alpha)K\log K
\log_2 n$ tests with probability
$1-\mathcal{O}{\left(\frac{1}{K^\alpha}\right)}$, where $\alpha>0$ is a
constant. By leveraging powerful design and analysis tools from modern
sparse-graph coding theory, SAFFRON is the first approach to reliable,
large-scale probabilistic group testing that offers both precisely
characterizable number of tests needed (down to the constants) together with
order-optimal decoding complexity.

Consider estimating an unknown, but structured, signal x0∈Rn from m
measurement yi=gi(aTix0), where the ai's are the rows of a known
measurement matrix A, and, g is a (potentially unknown) nonlinear and
random link-function. Such measurement functions could arise in applications
where the measurement device has nonlinearities and uncertainties. It could
also arise by design, e.g., gi(x)=sign(x+zi), corresponds to noisy
1-bit quantized measurements. Motivated by the classical work of Brillinger,
and more recent work of Plan and Vershynin, we estimate x0 via solving the
Generalized-LASSO for some regularization parameter λ>0 and some
(typically non-smooth) convex structure-inducing regularizer function. While
this approach seems to naively ignore the nonlinear function g, both
Brillinger (in the non-constrained case) and Plan and Vershynin have shown
that, when the entries of A are iid standard normal, this is a good estimator
of x0 up to a constant of proportionality μ, which only depends on g.
In this work, we considerably strengthen these results by obtaining explicit
expressions for the squared error, for the \emph{regularized} LASSO, that are
asymptotically \emph{precise} when m and n grow large. A main result is
that the estimation performance of the Generalized LASSO with non-linear
measurements is \emph{asymptotically the same} as one whose measurements are
linear yi=μaTix0+σzi, with μ=Eγg(γ) and
σ2=E(g(γ)−μγ)2, and, γ standard normal. To the
best of our knowledge, the derived expressions on the estimation performance
are the first-known precise results in this context. One interesting
consequence of our result is that the optimal quantizer of the measurements
that minimizes the estimation error of the LASSO is the celebrated Lloyd-Max
quantizer.