2019

Susceptibility of deep neural networks to adversarial attacks poses a major theoretical and practical challenge. All efforts to harden classifiers against such attacks have seen limited success till now. Two distinct categories of samples against which deep neural networks are vulnerable, ``adversarial samples" and ``fooling samples", have been tackled separately so far due to the difficulty posed when considered together. In this work, we show how one can defend against them both under a unified framework. Our model has the form of a variational autoencoder with a Gaussian mixture prior on the latent variable, such that each mixture component corresponds to a single class. We show how selective classification can be performed using this model, thereby causing the adversarial objective to entail a conflict. The proposed method leads to the rejection of adversarial samples instead of misclassification, while maintaining high precision and recall on test data. It also inherently provides a way of learning a selective classifier in a semi-supervised scenario, which can similarly resist adversarial attacks. We further show how one can reclassify the detected adversarial samples by iterative optimization.

Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well.

With the advent of deep neural networks, learning-based approaches for 3D reconstruction have gained popularity. However, unlike for images, in 3D there is no canonical representation which is both computationally and memory efficient yet allows for representing high-resolution geometry of arbitrary topology. Many of the state-of-the-art learning-based 3D reconstruction approaches can hence only represent very coarse 3D geometry or are limited to a restricted domain. In this paper, we propose Occupancy Networks, a new representation for learning-based 3D reconstruction methods. Occupancy networks implicitly represent the 3D surface as the continuous decision boundary of a deep neural network classifier. In contrast to existing approaches, our representation encodes a description of the 3D output at infinite resolution without excessive memory footprint. We validate that our representation can efficiently encode 3D structure and can be inferred from various kinds of input. Our experiments demonstrate competitive results, both qualitatively and quantitatively, for the challenging tasks of 3D reconstruction from single images, noisy point clouds and coarse discrete voxel grids. We believe that occupancy networks will become a useful tool in a wide variety of learning-based 3D tasks.

Variational Autoencoders (VAEs) provide a theoretically-backed framework for deep generative
models. However, they often produce “blurry” images, which is linked to their training objective. Sampling in the most popular implementation, the Gaussian VAE, can be interpreted as simply injecting noise to the input of a deterministic decoder. In practice, this simply enforces a smooth latent space structure. We challenge the adoption of the full VAE framework on this specific point in favor of a simpler, deterministic one. Specifically, we investigate how substituting stochasticity with other explicit and implicit regularization schemes can lead to a meaningful latent space without having to force it to conform to an arbitrarily chosen prior. To retrieve a generative mechanism for sampling new data points, we propose to employ an efficient ex-post density estimation step that can be readily adopted both for the proposed deterministic autoencoders as well as to improve sample quality of existing VAEs. We show in a rigorous empirical study that regularized deterministic autoencoding achieves state-of-the-art sample quality on the common MNIST, CIFAR-10 and CelebA datasets.

In the multiple instance learning setting, each observation is a bag of feature vectors of which one or more vectors indicates membership in a class. The primary task is to identify if any vectors in the bag indicate class membership while ignoring vectors that do not. We describe here a kernel-based technique that defines a parametric family of kernels via conformal transformations and jointly learns a discriminant function over bags together with the optimal parameter settings of the kernel. Learning a conformal transformation effectively amounts to weighting regions in the feature space according to their contribution to classification accuracy; regions that are discriminative will be weighted higher than regions that are not. This allows the classifier to focus on regions contributing to classification accuracy while ignoring regions that correspond to vectors found both in positive and in negative bags. We show how parameters of this transformation can be learned for support vector machines by posing the
problem as a multiple kernel learning problem. The resulting multiple instance classifier gives competitive accuracy for several multi-instance benchmark datasets from different domains.

The pedestal effect is the large improvement in the detectabilty of a sinusoidal signal grating observed when the signal is added to a masking or pedestal grating of the same spatial frequency, orientation, and phase. We measured the pedestal effect in both broadband and notched noise - noise from which a 1.5-octave band centred on the signal frequency had been removed. Although the pedestal effect persists in broadband noise, it almost disappears in the notched noise. Furthermore, the pedestal effect is substantial when either high- or low-pass masking noise is used. We conclude that the pedestal effect in the absence of notched noise results principally from the use of information derived from channels with peak sensitivities at spatial frequencies different from that of the signal and pedestal. The spatial-frequency components of the notched noise above and below the spatial frequency of the signal and pedestal prevent the use of information about changes in contrast carried in channels tuned to spatial frequencies that are very much different from that of the signal and pedestal. Thus the pedestal or dipper effect measured without notched noise is not a characteristic of individual spatial-frequency tuned channels.

We propose two statistical tests to determine if two samples are
from different distributions. Our test statistic is in both cases
the distance between the means of the two samples mapped into a
reproducing kernel Hilbert space (RKHS). The first test is based on
a large deviation bound for the test statistic, while the second is
based on the asymptotic distribution of this statistic. We show that
the test statistic can be computed in $O(m^2)$ time. We apply our
approach to a variety of problems, including attribute matching for
databases using the Hungarian marriage method, where our test performs strongly.
We also demonstrate excellent
performance when comparing distributions over graphs, for which no
alternative tests currently exist.

NIPS Workshop on New Problems and Methods in Computational Biology, December 2006 (talk)

Abstract

We propose a new boosting method that systematically combines graph mining and mathematical programming-based machine learning. Informative and interpretable subgraph features are greedily found by a series of graph mining calls. Due to our mathematical programming formulation, subgraph features and pre-calculated real-valued features are seemlessly integrated. We tested our algorithm on a quantitative structure-activity relationship (QSAR) problem, which is basically a regression problem when given a set of chemical compounds. In benchmark experiments, the prediction accuracy of our method favorably compared with the best results reported on each dataset.

NIPS Workshop on Causality and Feature Selection, December 2006 (talk)

Abstract

We propose a new approach to infer the causal structure that has generated the observed statistical dependences among n random variables. The idea is that the factorization of the joint measure of cause and effect into P(cause)P(effect|cause) leads typically to simpler conditionals than non-causal factorizations. To evaluate the complexity of the conditionals we have tried two methods. First, we have compared them to those which maximize the conditional entropy subject to the observed first and second moments since we consider the latter as the simplest conditionals. Second, we have fitted the data with conditional probability measures being exponents of functions in an RKHS space and defined the complexity by a Hilbert-space semi-norm. Such a complexity measure has several properties that are useful for our purpose. We describe some encouraging results with both methods applied to real-world data. Moreover, we have combined constraint-based approaches to causal discovery (i.e., methods using only information
on conditional statistical dependences) with our method in order to distinguish between causal hypotheses which are equivalent with respect to the imposed independences. Furthermore, we compare the performance to Bayesian approaches to causal inference.

The availability of new and fast tools in structure determination has led to a more than exponential growth of the number of structures solved per year. It is therefore increasingly essential to assess the accuracy of the new structures by reliable approaches able to assist validation. Here, we discuss a specific example in which the use of different complementary techniques, which include Bayesian methods and small angle scattering, resulted essential for validating the two currently available structures of the Josephin domain of ataxin-3, a protein involved in the ubiquitin/proteasome pathway and responsible for neurodegenerative spinocerebellar ataxia of type 3. Taken together, our results demonstrate that only one of the two structures is compatible with the experimental information. Based on the high precision of our refined structure, we show that Josephin contains an open cleft which could be directly implicated in the interaction with polyubiquitin chains and other partners.

Volterra and Wiener series are perhaps the best understood nonlinear system representations in signal processing. Although both approaches have enjoyed a certain popularity in the past, their application has been limited to rather low-dimensional and weakly nonlinear systems due to the exponential growth of the number of terms that have to be estimated. We show that Volterra and Wiener series can be represented implicitly as elements of a reproducing kernel Hilbert space by utilizing polynomial kernels. The estimation complexity of the implicit representation is linear in the input dimensionality and
independent of the degree of nonlinearity. Experiments show performance advantages in terms of convergence, interpretability, and system sizes that can be handled.

We propose a general framework for computing minimal set covers under class of certain logical constraints.
The underlying idea is to transform the problem into a mathematical programm under linear constraints.
In this sense it can be seen as a natural extension of the vector quantization algorithm proposed by Tipping and Schoelkopf.
We show which class of logical constraints can be cast and relaxed into linear constraints and give an algorithm for
the transformation.

In computational biology, it is common to represent domain knowledge using graphs. Frequently there exist multiple graphs for the same set of nodes, representing information from different sources, and no single graph is sufficient to predict class labels of unlabelled nodes reliably. One way to enhance reliability is to integrate multiple graphs, since individual graphs are partly independent and partly complementary to each other for prediction. In this chapter, we describe an algorithm to assign weights to multiple graphs within graph-based semi-supervised learning. Both predicting class labels and searching for weights for combining multiple graphs are formulated into one convex optimization problem. The graph-combining method is applied to functional class prediction of yeast proteins.When compared with individual graphs, the combined graph with optimized weights performs significantly better than any single graph.When compared with the semidefinite programming-based support vector machine (SDP/SVM), it shows comparable accuracy in a remarkably short time. Compared with a combined graph with equal-valued weights, our method could select important graphs without loss of accuracy, which implies the desirable property of integration with selectivity.

A major challenge in applying machine learning methods to Brain-Computer
Interfaces (BCIs) is to overcome the possible nonstationarity in the data from the datablock
the method is trained on and that the method is applied to. Assuming the joint
distributions of the whitened signal and the class label to be identical in two blocks, where
the whitening is done in each block independently, we propose a simple adaptation formula
that is applicable to a broad class of spatial filtering methods including ICA, CSP, and
logistic regression classifiers. We characterize the class of linear transformations for which
the above assumption holds. Experimental results on 60 BCI datasets show improved
classification accuracy compared to (a) fixed spatial filter approach (no adaptation) and
(b) fixed spatial pattern approach (proposed by Hill et al., 2006 [1]).

Many real-world machine learning problems are situated on finite discrete sets,
including dimensionality reduction, clustering, and transductive inference. A variety
of approaches for learning from finite sets has been proposed from different
motivations and for different problems. In most of those approaches, a finite set
is modeled as a graph, in which the edges encode pairwise relationships among the
objects in the set. Consequently many concepts and methods from graph theory are
adopted. In particular, the graph Laplacian is widely used.
In this chapter we present a systemic framework for learning from a finite set
represented as a graph. We develop discrete analogues of a number of differential
operators, and then construct a discrete analogue of classical regularization theory
based on those discrete differential operators. The graph Laplacian based approaches
are special cases of this general discrete regularization framework. An important
thing implied in this framework is that we have a wide choices of regularization on
graph in addition to the widely-used graph Laplacian based one.

Journal of the European Ceramic Society, 26(15):3061-3065, November 2006 (article)

Abstract

A common approach for the determination of Slow Crack Growth (SCG) parameters
are the static and dynamic loading method. Since materials with small Weibull
module show a large variability in strength, a correct statistical analysis of the
data is indispensable. In this work we propose the use of the Maximum Likelihood
method and a Baysian analysis, which, in contrast to the standard procedures, take
into account that failure strengths are Weibull distributed. The analysis provides
estimates for the SCG parameters, the Weibull module, and the corresponding confidence
intervals and overcomes the necessity of manual differentiation between inert
and fatigue strength data. We compare the methods to a Least Squares approach,
which can be considered the standard procedure. The results for dynamic loading
data from the glass sealing of MEMS devices show that the assumptions inherent
to the standard approach lead to significantly different estimates.

We present easy-to-use alternatives to the often-used two-stage Common Spatial Pattern + classifier approach for spatial filtering and classification of Event-Related Desychnronization signals in BCI. We report two algorithms that aim to optimize the spatial filters according to a criterion more directly related to the ability of the algorithms to generalize to unseen data. Both are based upon the idea of treating the spatial filter coefficients as hyperparameters of a kernel or covariance function. We then optimize these hyper-parameters directly along side the normal classifier parameters with respect to our chosen learning objective function. The two objectives considered are margin maximization as used in Support-Vector Machines and the evidence maximization framework used in Gaussian Processes. Our experiments assessed generalization error as a function of the number of training points used, on 9 BCI competition data sets and 5 offline motor imagery data sets measured in Tubingen. Both our approaches sho
w consistent improvements relative to the commonly used CSP+linear classifier combination. Strikingly, the improvement is most significant in the higher noise cases, when either few trails are used for training, or with the most poorly performing subjects. This a reversal of the usual "rich get richer" effect in the development of CSP extensions, which tend to perform best when the signal is strong enough to accurately find their additional parameters. This makes our approach particularly suitable for clinical application where high levels of noise are to be expected.

A promising new combination in multimodality imaging is MR-PET, where the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET) are combined.
Although many technical problems have recently been solved, it is still an open problem to determine the attenuation map from the available MR scan, as the MR intensities are not directly related to the attenuation values. One standard approach is an atlas registration where the atlas MR image is aligned with the patient MR thus also yielding an attenuation image for the patient. We also propose another approach, which to our knowledge has not been tried before: Using Support Vector Machines we predict the attenuation value directly from the local image information. We train this well-established machine learning algorithm using small image patches.
Although both approaches sometimes yielded acceptable results, they also showed their specific shortcomings: The registration often fails with large deformations whereas the prediction approach is problematic when the local image structure is not characteristic enough. However, the failures often do not coincide and integration of both information sources is promising. We therefore developed a combination method extending Support Vector Machines to use not only local image structure but also atlas registered coordinates.
We demonstrate the strength of this combination approach on a number of examples.

Motivation: In detection of non-coding RNAs, it is often necessary
to identify the secondary structure motifs from a set of putative RNA
sequences. Most of the existing algorithms aim to provide the best
motif or few good motifs, but biologists often need to inspect all the
possible motifs thoroughly.
Results: Our method RNAmine employs a graph theoretic representation
of RNA sequences, and detects all the possible motifs
exhaustively using a graph mining algorithm. The motif detection problem
boils down to finding frequently appearing patterns in a set of
directed and labeled graphs. In the tasks of common secondary structure
prediction and local motif detection from long sequences, our
method performed favorably both in accuracy and in efficiency with
the state-of-the-art methods such as CMFinder.

Objective. Despite many research efforts in recent
decades, the major pathogenetic mechanisms of osteo-
arthritis (OA), including gene alterations occurring
during OA cartilage degeneration, are poorly under-
stood, and there is no disease-modifying treatment
approach. The present study was therefore initiated in
order to identify differentially expressed disease-related
genes and potential therapeutic targets.
Methods. This investigation consisted of a large
gene expression profiling study performed based on 78
normal and disease samples, using a custom-made
complementar y DNA array covering &gt;4,000 genes.
Results. Many differentially expressed genes were
identified, including the expected up-regulation of ana-
bolic and catabolic matrix genes. In particular, the
down-regulation of important oxidative defense genes,
i.e., the genes for superoxide dismutases 2 and 3 and
glutathione peroxidase 3, was prominent. This indicates
that continuous oxidative stress to the cells and the
matrix is one major underlying pathogenetic mecha-
nism in OA. Also, genes that are involved in the
phenot ypic stabilit y of cells, a feature that is greatly
reduced in OA cartilage, appeared to be suppressed.
Conclusion. Our findings provide a reference data set on gene alterations in OA cartilage and, importantly,
indicate major mechanisms underlying central cell bio-
logic alterations that occur during the OA disease
process. These results identify molecular targets that
can be further investigated in the search for therapeutic
interventions.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems