arXiv.orghttp://arxiv.org/icons/sfx.gifhttp://arxiv.org/
Spatio-Temporal Deep Learning Models for Tip Force Estimation During Needle Insertion. (arXiv:1905.09282v1 [eess.IV])http://arxiv.org/abs/1905.09282
<p>Purpose. Precise placement of needles is a challenge in a number of clinical
applications such as brachytherapy or biopsy. Forces acting at the needle cause
tissue deformation and needle deflection which in turn may lead to misplacement
or injury. Hence, a number of approaches to estimate the forces at the needle
have been proposed. Yet, integrating sensors into the needle tip is challenging
and a careful calibration is required to obtain good force estimates.
</p>
<p>Methods. We describe a fiber-optical needle tip force sensor design using a
single OCT fiber for measurement. The fiber images the deformation of an epoxy
layer placed below the needle tip which results in a stream of 1D depth
profiles. We study different deep learning approaches to facilitate calibration
between this spatio-temporal image data and the related forces. In particular,
we propose a novel convGRU-CNN architecture for simultaneous spatial and
temporal data processing.
</p>
<p>Results. The needle can be adapted to different operating ranges by changing
the stiffness of the epoxy layer. Likewise, calibration can be adapted by
training the deep learning models. Our novel convGRU-CNN architecture results
in the lowest mean absolute error of 1.59 +- 1.3 mN and a cross-correlation
coefficient of 0.9997, and clearly outperforms the other methods. Ex vivo
experiments in human prostate tissue demonstrate the needle's application.
</p>
<p>Conclusions. Our OCT-based fiber-optical sensor presents a viable alternative
for needle tip force estimation. The results indicate that the rich
spatio-temporal information included in the stream of images showing the
deformation throughout the epoxy layer can be effectively used by deep learning
models. Particularly, we demonstrate that the convGRU-CNN architecture performs
favorably, making it a promising approach for other spatio-temporal learning
problems.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Gessert_N/0/1/0/all/0/1">Nils Gessert</a>, <a href="http://arxiv.org/find/eess/1/au:+Priegnitz_T/0/1/0/all/0/1">Torben Priegnitz</a>, <a href="http://arxiv.org/find/eess/1/au:+Saathoff_T/0/1/0/all/0/1">Thore Saathoff</a>, <a href="http://arxiv.org/find/eess/1/au:+Antoni_S/0/1/0/all/0/1">Sven-Thomas Antoni</a>, <a href="http://arxiv.org/find/eess/1/au:+Meyer_D/0/1/0/all/0/1">David Meyer</a>, <a href="http://arxiv.org/find/eess/1/au:+Hamann_M/0/1/0/all/0/1">Moritz Franz Hamann</a>, <a href="http://arxiv.org/find/eess/1/au:+Junemann_K/0/1/0/all/0/1">Klaus-Peter J&#xfc;nemann</a>, <a href="http://arxiv.org/find/eess/1/au:+Otte_C/0/1/0/all/0/1">Christoph Otte</a>, <a href="http://arxiv.org/find/eess/1/au:+Schlaefer_A/0/1/0/all/0/1">Alexander Schlaefer</a>Power-optimal, stabilized entangling gate between trapped-ion qubits. (arXiv:1905.09292v1 [quant-ph])http://arxiv.org/abs/1905.09292
<p>To achieve scalable quantum computing, improving entangling-gate fidelity and
its implementation-efficiency are of utmost importance. We present here a
linear method to construct provably power-optimal entangling gates on an
arbitrary pair of qubits on a trapped-ion quantum computer. This method
leverages simultaneous modulation of amplitude, frequency, and phase of the
beams that illuminate the ions and, unlike the state of the art, does not
require any search in the parameter space. The linear method is extensible,
enabling stabilization against external parameter fluctuations to an arbitrary
order at a cost linear in the order.
</p>
<a href="http://arxiv.org/find/quant-ph/1/au:+Blumel_R/0/1/0/all/0/1">Reinhold Blumel</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Grzesiak_N/0/1/0/all/0/1">Nikodem Grzesiak</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Nam_Y/0/1/0/all/0/1">Yunseong Nam</a>Efficient Arbitrary Simultaneously Entangling Gates on a trapped-ion quantum computer. (arXiv:1905.09294v1 [quant-ph])http://arxiv.org/abs/1905.09294
<p>Efficiently entangling pairs of qubits is essential to fully harness the
power of quantum computing. Here, we devise an exact protocol that
simultaneously entangles arbitrary pairs of qubits on a trapped-ion quantum
computer. The protocol requires classical computational resources polynomial in
the system size, and very little overhead in the quantum control compared to a
single-pair case. We demonstrate an exponential improvement in both classical
and quantum resources over the current state of the art. We implement the
protocol on a software-defined trapped-ion quantum computer, where we
reconfigure the quantum computer architecture on demand. Together with the
all-to-all connectivity available in trapped-ion quantum computers, our results
establish that trapped ions are a prime candidate for a scalable quantum
computing platform with minimal quantum latency.
</p>
<a href="http://arxiv.org/find/quant-ph/1/au:+Grzesiak_N/0/1/0/all/0/1">Nikodem Grzesiak</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Blumel_R/0/1/0/all/0/1">Reinhold Bl&#xfc;mel</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Beck_K/0/1/0/all/0/1">Kristin Beck</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Wright_K/0/1/0/all/0/1">Kenneth Wright</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Chaplin_V/0/1/0/all/0/1">Vandiver Chaplin</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Amini_J/0/1/0/all/0/1">Jason M. Amini</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Pisenti_N/0/1/0/all/0/1">Neal C. Pisenti</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Debnath_S/0/1/0/all/0/1">Shantanu Debnath</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Chen_J/0/1/0/all/0/1">Jwo-Sy Chen</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Nam_Y/0/1/0/all/0/1">Yunseong Nam</a>PoseRBPF: A Rao-Blackwellized Particle Filter for 6D Object Pose Tracking. (arXiv:1905.09304v1 [cs.CV])http://arxiv.org/abs/1905.09304
<p>Tracking 6D poses of objects from videos provides rich information to a robot
in performing different tasks such as manipulation and navigation. In this
work, we formulate the 6D object pose tracking problem in the Rao-Blackwellized
particle filtering framework, where the 3D rotation and the 3D translation of
an object are decoupled. This factorization allows our approach, called
PoseRBPF, to efficiently estimate the 3D translation of an object along with
the full distribution over the 3D rotation. This is achieved by discretizing
the rotation space in a fine-grained manner, and training an auto-encoder
network to construct a codebook of feature embeddings for the discretized
rotations. As a result, PoseRBPF can track objects with arbitrary symmetries
while still maintaining adequate posterior distributions. Our approach achieves
state-of-the-art results on two 6D pose estimation benchmarks. A video showing
the experiments can be found at https://youtu.be/lE5gjzRKWuA
</p>
<a href="http://arxiv.org/find/cs/1/au:+Deng_X/0/1/0/all/0/1">Xinke Deng</a>, <a href="http://arxiv.org/find/cs/1/au:+Mousavian_A/0/1/0/all/0/1">Arsalan Mousavian</a>, <a href="http://arxiv.org/find/cs/1/au:+Xiang_Y/0/1/0/all/0/1">Yu Xiang</a>, <a href="http://arxiv.org/find/cs/1/au:+Xia_F/0/1/0/all/0/1">Fei Xia</a>, <a href="http://arxiv.org/find/cs/1/au:+Bretl_T/0/1/0/all/0/1">Timothy Bretl</a>, <a href="http://arxiv.org/find/cs/1/au:+Fox_D/0/1/0/all/0/1">Dieter Fox</a>Kernel Wasserstein Distance. (arXiv:1905.09314v1 [cs.LG])http://arxiv.org/abs/1905.09314
<p>The Wasserstein distance is a powerful metric based on the theory of optimal
transport. It gives a natural measure of the distance between two distributions
with a wide range of applications. In contrast to a number of the common
divergences on distributions such as Kullback-Leibler or Jensen-Shannon, it is
(weakly) continuous, and thus ideal for analyzing corrupted data. To date,
however, no kernel methods for dealing with nonlinear data have been proposed
via the Wasserstein distance. In this work, we develop a novel method to
compute the L2-Wasserstein distance in a kernel space implemented using the
kernel trick. The latter is a general method in machine learning employed to
handle data in a nonlinear manner. We evaluate the proposed approach in
identifying computerized tomography (CT) slices with dental artifacts in head
and neck cancer, performing unsupervised hierarchical clustering on the
resulting Wasserstein distance matrix that is computed on imaging texture
features extracted from each CT slice. Our experiments show that the kernel
approach outperforms classical non-kernel approaches in identifying CT slices
with artifacts.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Oh_J/0/1/0/all/0/1">Jung Hun Oh</a>, <a href="http://arxiv.org/find/cs/1/au:+Pouryahya_M/0/1/0/all/0/1">Maryam Pouryahya</a>, <a href="http://arxiv.org/find/cs/1/au:+Iyer_A/0/1/0/all/0/1">Aditi Iyer</a>, <a href="http://arxiv.org/find/cs/1/au:+Apte_A/0/1/0/all/0/1">Aditya P. Apte</a>, <a href="http://arxiv.org/find/cs/1/au:+Tannenbaum_A/0/1/0/all/0/1">Allen Tannenbaum</a>, <a href="http://arxiv.org/find/cs/1/au:+Deasy_J/0/1/0/all/0/1">Joseph O. Deasy</a>Solving Random Systems of Quadratic Equations with Tanh Wirtinger Flow. (arXiv:1905.09320v1 [math.OC])http://arxiv.org/abs/1905.09320
<p>Solving quadratic systems of equations in n variables and m measurements of
the form $y_i = |a^T_i x|^2$ , $i = 1, ..., m$ and $x \in R^n$ , which is also
known as phase retrieval, is a hard nonconvex problem. In the case of standard
Gaussian measurement vectors, the wirtinger flow algorithm Chen and Candes
(2015) is an efficient solution. In this paper, we proposed a new form of
wirtinger flow and a new spectral initialization method based on this new
algorithm. We proved that the new wirtinger flow and initialization method
achieve linear sample and computational complexities. We further extended the
new phasing algorithm by combining it with other existing methods. Finally, we
demonstrated the effectiveness of our new method in the low data to parameter
ratio settings where the number of measurements which is less than
information-theoretic limit, namely, $m &lt; 2n$, via numerical tests. For
instance, our method can solve the quadratic systems of equations with gaussian
measurement vector with probability $\ge 97\%$ when $m/n = 1.7$ and $n = 1000$,
and with probability $\approx 60\%$ when $m/n = 1.5$ and $n = 1000$.
</p>
<a href="http://arxiv.org/find/math/1/au:+Luo_Z/0/1/0/all/0/1">Zhenwei Luo</a>, <a href="http://arxiv.org/find/math/1/au:+Zhang_Y/0/1/0/all/0/1">Ye Zhang</a>A Simple Receive Diversity Technique for Distributed Beamforming. (arXiv:1905.09321v1 [cs.IT])http://arxiv.org/abs/1905.09321
<p>A simple method is proposed for use in a scenario involving a single-antenna
source node communicating with a destination node that is equipped with two
antennas via multiple single-antenna relay nodes, where each relay is subject
to an individual power constraint. Furthermore, ultra-reliable and low-latency
communication are desired. The latter requirement translates to considering
only schemes that make use of local channel state information. Whereas for a
receiver equipped with a single antenna, distributed beamforming is a well
known and adequate solution, no straightforward extension is known. In this
paper, a scheme is proposed based on a space-time diversity transformation that
is applied as a front-end operation at the destination node. This results in an
effective unitary channel matrix replacing the scalar coefficient corresponding
to each user. Each relay node then inverts its associated channel matrix, which
is the generalization of undoing the channel phase in the classical case of
distributed beamforming to a single-antenna receiver, and then repeats the
message over the resulting "gain-only" channel. In comparison to a
single-antenna destination node, the method doubles the diversity order without
requiring any channel state information at the receiver while at the same time
retaining the array gain offered by the relays.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Domanovitz_E/0/1/0/all/0/1">Elad Domanovitz</a>, <a href="http://arxiv.org/find/cs/1/au:+Erez_U/0/1/0/all/0/1">Uri Erez</a>Learning Fast Magnetic Resonance Imaging. (arXiv:1905.09324v1 [eess.IV])http://arxiv.org/abs/1905.09324
<p>Magnetic Resonance Imaging (MRI) is considered today the golden-standard
modality for soft tissues. The long acquisition times, however, make it more
prone to motion artifacts as well as contribute to the relatively high costs of
this examination. Over the years, multiple studies concentrated on designing
reduced measurement schemes and image reconstruction schemes for MRI, however,
these problems have been so far addressed separately. On the other hand, recent
works in optical computational imaging have demonstrated growing success of the
simultaneous learning-based design of the acquisition and reconstruction
schemes manifesting significant improvement in the reconstruction quality with
a constrained time budget. Inspired by these successes, in this work, we
propose to learn accelerated MR acquisition schemes (in the form of Cartesian
trajectories) jointly with the image reconstruction operator. To this end, we
propose an algorithm for training the combined acquisition-reconstruction
pipeline end-to-end in a differentiable way. We demonstrate the significance of
using the learned Cartesian trajectories at different speed up rates.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Weiss_T/0/1/0/all/0/1">Tomer Weiss</a>, <a href="http://arxiv.org/find/eess/1/au:+Vedula_S/0/1/0/all/0/1">Sanketh Vedula</a>, <a href="http://arxiv.org/find/eess/1/au:+Senouf_O/0/1/0/all/0/1">Ortal Senouf</a>, <a href="http://arxiv.org/find/eess/1/au:+Bronstein_A/0/1/0/all/0/1">Alex Bronstein</a>, <a href="http://arxiv.org/find/eess/1/au:+Michailovich_O/0/1/0/all/0/1">Oleg Michailovich</a>, <a href="http://arxiv.org/find/eess/1/au:+Zibulevsky_M/0/1/0/all/0/1">Michael Zibulevsky</a>Self-supervised learning of inverse problem solvers in medical imaging. (arXiv:1905.09325v1 [eess.IV])http://arxiv.org/abs/1905.09325
<p>In the past few years, deep learning-based methods have demonstrated enormous
success for solving inverse problems in medical imaging. In this work, we
address the following question:\textit{Given a set of measurements obtained
from real imaging experiments, what is the best way to use a learnable model
and the physics of the modality to solve the inverse problem and reconstruct
the latent image?} Standard supervised learning based methods approach this
problem by collecting data sets of known latent images and their corresponding
measurements. However, these methods are often impractical due to the lack of
availability of appropriately sized training sets, and, more generally, due to
the inherent difficulty in measuring the "groundtruth" latent image. In light
of this, we propose a self-supervised approach to training inverse models in
medical imaging in the absence of aligned data. Our method only requiring
access to the measurements and the forward model at training. We showcase its
effectiveness on inverse problems arising in accelerated magnetic resonance
imaging (MRI).
</p>
<a href="http://arxiv.org/find/eess/1/au:+Senouf_O/0/1/0/all/0/1">Ortal Senouf</a>, <a href="http://arxiv.org/find/eess/1/au:+Vedula_S/0/1/0/all/0/1">Sanketh Vedula</a>, <a href="http://arxiv.org/find/eess/1/au:+Weiss_T/0/1/0/all/0/1">Tomer Weiss</a>, <a href="http://arxiv.org/find/eess/1/au:+Bronstein_A/0/1/0/all/0/1">Alex Bronstein</a>, <a href="http://arxiv.org/find/eess/1/au:+Michailovich_O/0/1/0/all/0/1">Oleg Michailovich</a>, <a href="http://arxiv.org/find/eess/1/au:+Zibulevsky_M/0/1/0/all/0/1">Michael Zibulevsky</a>The Journey is the Reward: Unsupervised Learning of Influential Trajectories. (arXiv:1905.09334v1 [cs.LG])http://arxiv.org/abs/1905.09334
<p>Unsupervised exploration and representation learning become increasingly
important when learning in diverse and sparse environments. The
information-theoretic principle of empowerment formalizes an unsupervised
exploration objective through an agent trying to maximize its influence on the
future states of its environment. Previous approaches carry certain limitations
in that they either do not employ closed-loop feedback or do not have an
internal state. As a consequence, a privileged final state is taken as an
influence measure, rather than the full trajectory. We provide a model-free
method which takes into account the whole trajectory while still offering the
benefits of option-based approaches. We successfully apply our approach to
settings with large action spaces, where discovery of meaningful action
sequences is particularly difficult.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Binas_J/0/1/0/all/0/1">Jonathan Binas</a>, <a href="http://arxiv.org/find/cs/1/au:+Ozair_S/0/1/0/all/0/1">Sherjil Ozair</a>, <a href="http://arxiv.org/find/cs/1/au:+Bengio_Y/0/1/0/all/0/1">Yoshua Bengio</a>Imitation Learning from Video by Leveraging Proprioception. (arXiv:1905.09335v1 [cs.LG])http://arxiv.org/abs/1905.09335
<p>Classically, imitation learning algorithms have been developed for idealized
situations, e.g., the demonstrations are often required to be collected in the
exact same environment and usually include the demonstrator's actions.
Recently, however, the research community has begun to address some of these
shortcomings by offering algorithmic solutions that enable imitation learning
from observation (IfO), e.g., learning to perform a task from visual
demonstrations that may be in a different environment and do not include
actions. Motivated by the fact that agents often also have access to their own
internal states (i.e., proprioception), we propose and study an IfO algorithm
that leverages this information in the policy learning process. The proposed
architecture learns policies over proprioceptive state representations and
compares the resulting trajectories visually to the demonstration data. We
experimentally test the proposed technique on several MuJoCo domains and show
that it outperforms other imitation from observation algorithms by a large
margin.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Torabi_F/0/1/0/all/0/1">Faraz Torabi</a>, <a href="http://arxiv.org/find/cs/1/au:+Warnell_G/0/1/0/all/0/1">Garrett Warnell</a>, <a href="http://arxiv.org/find/cs/1/au:+Stone_P/0/1/0/all/0/1">Peter Stone</a>Simulation-Based Cyber Data Collection Efficacy. (arXiv:1905.09336v1 [cs.CR])http://arxiv.org/abs/1905.09336
<p>Building upon previous research in honeynets and simulations, we present
efforts from a two-and-a-half-year study using a representative simulation to
collect cybersecurity data. Unlike traditional honeypots or honeynets, our
experiment utilizes a full-scale operational network to model a small business
environment. The simulation uses default security configurations to defend the
network, testing the assumption that given standard security baseline, devices
networked to the public Internet will necessarily be hacked. Given network
activity appropriate for its context, results support the conclusion that no
actors where able to break in, despite only default security settings.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Thaw_D/0/1/0/all/0/1">David Thaw</a>, <a href="http://arxiv.org/find/cs/1/au:+Barkley_B/0/1/0/all/0/1">Bret Barkley</a>, <a href="http://arxiv.org/find/cs/1/au:+Bella_G/0/1/0/all/0/1">Gerry Bella</a>, <a href="http://arxiv.org/find/cs/1/au:+Gardner_C/0/1/0/all/0/1">Carrie Gardner</a>Automating Whole Brain Histology to MRI Registration: Implementation of a Computational Pipeline. (arXiv:1905.09339v1 [cs.CV])http://arxiv.org/abs/1905.09339
<p>Although the latest advances in MRI technology have allowed the acquisition
of higher resolution images, reliable delineation of cytoarchitectural or
subcortical nuclei boundaries is not possible. As a result, histological images
are still required to identify the exact limits of neuroanatomical structures.
However, histological processing is associated with tissue distortion and
fixation artifacts, which prevent a direct comparison between the two
modalities. Our group has previously proposed a histological procedure based on
celloidin embedding that reduces the amount of artifacts and yields high
quality whole brain histological slices. Celloidin embedded tissue,
nevertheless, still bears distortions that must be corrected. We propose a
computational pipeline designed to semi-automatically process the celloidin
embedded histology and register them to their MRI counterparts. In this paper
we report the accuracy of our pipeline in two whole brain volumes from the
Brain Bank of the Brazilian Aging Brain Study Group (BBBABSG). Results were
assessed by comparison of manual segmentations from two experts in both MRIs
and the registered histological volumes. The two whole brain histology/MRI
datasets were successfully registered using minimal user interaction. We also
point to possible improvements based on recent implementations that could be
added to this pipeline, potentially allowing for higher precision and further
performance gains.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Alegro_M/0/1/0/all/0/1">Maryana Alegro</a>, <a href="http://arxiv.org/find/cs/1/au:+Alho_E/0/1/0/all/0/1">Eduardo J. L. Alho</a>, <a href="http://arxiv.org/find/cs/1/au:+Martin_M/0/1/0/all/0/1">Maria da Graca Morais Martin</a>, <a href="http://arxiv.org/find/cs/1/au:+Grinberg_L/0/1/0/all/0/1">Lea Teneholz Grinberg</a>, <a href="http://arxiv.org/find/cs/1/au:+Heinsen_H/0/1/0/all/0/1">Helmut Heinsen</a>, <a href="http://arxiv.org/find/cs/1/au:+Lopes_R/0/1/0/all/0/1">Roseli de Deus Lopes</a>, <a href="http://arxiv.org/find/cs/1/au:+Edson_E/0/1/0/all/0/1">Edson Amaro-Jr</a>, <a href="http://arxiv.org/find/cs/1/au:+Zollei_L/0/1/0/all/0/1">Lilla Z&#xf6;llei</a>Generative Imputation and Stochastic Prediction. (arXiv:1905.09340v1 [cs.LG])http://arxiv.org/abs/1905.09340
<p>In many machine learning applications, we are faced with incomplete datasets.
In the literature, missing data imputation techniques have been mostly
concerned with filling missing values. However, the existence of missing values
is synonymous with uncertainties not only over the distribution of missing
values but also over target class assignments that require careful
consideration. The objectives of this paper are twofold. First, we proposed a
method for generating imputations from the conditional distribution of missing
values given observed values. Second, we use the generated samples to estimate
the distribution of target assignments given incomplete data. In order to
generate imputations, we train a simple and effective generator network to
generate imputations that a discriminator network is tasked to distinguish.
Following this, a predictor network is trained using imputed samples from the
generator network to capture the classification uncertainties and make
predictions accordingly. The proposed method is evaluated on CIFAR-10 image
dataset as well as two real-world tabular classification datasets, under
various missingness rates and structures. Our experimental results show the
effectiveness of the proposed method in generating imputations, as well as
providing estimates for the class uncertainties in a classification task when
faced with missing values.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kachuee_M/0/1/0/all/0/1">Mohammad Kachuee</a>, <a href="http://arxiv.org/find/cs/1/au:+Karkkainen_K/0/1/0/all/0/1">Kimmo Karkkainen</a>, <a href="http://arxiv.org/find/cs/1/au:+Goldstein_O/0/1/0/all/0/1">Orpaz Goldstein</a>, <a href="http://arxiv.org/find/cs/1/au:+Darabi_S/0/1/0/all/0/1">Sajad Darabi</a>, <a href="http://arxiv.org/find/cs/1/au:+Sarrafzadeh_M/0/1/0/all/0/1">Majid Sarrafzadeh</a>Interdependent Strategic Security Risk Management with Bounded Rationality in the Internet of Things. (arXiv:1905.09341v1 [cs.GT])http://arxiv.org/abs/1905.09341
<p>With the increasing connectivity enabled by the Internet of Things (IoT),
security becomes a critical concern, and the users should invest to secure
their IoT applications. Due to the massive devices in the IoT network, users
cannot be aware of the security policies taken by all its connected neighbors.
Instead, a user makes security decisions based on the cyber risks he perceives
by observing a selected number of nodes. To this end, we propose a model which
incorporates the limited attention or bounded rationality nature of players in
the IoT. Specifically, each individual builds a sparse cognitive network of
nodes to respond to. Based on this simplified cognitive network representation,
each user then determines his security management policy by minimizing his own
real-world security cost. The bounded rational decision-makings of players and
their cognitive network formations are interdependent and thus should be
addressed in a holistic manner. We establish a games-in-games framework and
propose a Gestalt Nash equilibrium (GNE) solution concept to characterize the
decisions of agents, and quantify their risk of bounded perception due to the
limited attention. In addition, we design a proximal-based iterative algorithm
to compute the GNE. With case studies of smart communities, the designed
algorithm can successfully identify the critical users whose decisions need to
be taken into account by the other users during the security management.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chen_J/0/1/0/all/0/1">Juntao Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhu_Q/0/1/0/all/0/1">Quanyan Zhu</a>Reachable Space Characterization of Markov Decision Processes with Time Variability. (arXiv:1905.09342v1 [cs.RO])http://arxiv.org/abs/1905.09342
<p>We propose a solution to a time-varying variant of Markov Decision Processes
which can be used to address decision-theoretic planning problems for
autonomous systems operating in unstructured outdoor environments. We explore
the time variability property of the planning stochasticity and investigate the
state reachability, based on which we then develop an efficient iterative
method that offers a good trade-off between solution optimality and time
complexity. The reachability space is constructed by analyzing the means and
variances of states' reaching time in the future. We validate our algorithm
through extensive simulations using ocean data, and the results show that our
method achieves a great performance in terms of both solution quality and
computing time.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Xu_J/0/1/0/all/0/1">Junhong Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Kai_Y/0/1/0/all/0/1">Yin Kai</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_L/0/1/0/all/0/1">Lantao Liu</a>KPynq: A Work-Efficient Triangle-Inequality based K-means on FPGA. (arXiv:1905.09345v1 [cs.DC])http://arxiv.org/abs/1905.09345
<p>K-means is a popular but computation-intensive algorithm for unsupervised
learning. To address this issue, we present KPynq, a work-efficient
triangle-inequality based K-means on FPGA for handling large-size,
high-dimension datasets. KPynq leverages an algorithm-level optimization to
balance the performance and computation irregularity, and a hardware
architecture design to fully exploit the pipeline and parallel processing
capability of various FPGAs. In the experiment, KPynq consistently outperforms
the CPU-based standard K-means in terms of its speedup (up to 4.2x) and
significant energy-efficiency (up to 218x).
</p>
<a href="http://arxiv.org/find/cs/1/au:+Wang_Y/0/1/0/all/0/1">Yuke Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zeng_Z/0/1/0/all/0/1">Zhaorui Zeng</a>, <a href="http://arxiv.org/find/cs/1/au:+Feng_B/0/1/0/all/0/1">Boyuan Feng</a>, <a href="http://arxiv.org/find/cs/1/au:+Deng_L/0/1/0/all/0/1">Lei Deng</a>, <a href="http://arxiv.org/find/cs/1/au:+Ding_Y/0/1/0/all/0/1">Yufei Ding</a>Approximately Maximizing the Broker's Profit in a Two-sided Market. (arXiv:1905.09347v1 [cs.GT])http://arxiv.org/abs/1905.09347
<p>We study how to maximize the broker's (expected) profit in a two-sided
market, where she buys items from a set of sellers and resells them to a set of
buyers. Each seller has a single item to sell and holds a private value on her
item, and each buyer has a valuation function over the bundles of the sellers'
items. We consider the Bayesian setting where the agents' values are
independently drawn from prior distributions, and aim at designing
dominant-strategy incentive-compatible (DSIC) mechanisms that are approximately
optimal.
</p>
<p>Production-cost markets, where each item has a publicly-known cost to be
produced, provide a platform for us to study two-sided markets. Briefly, we
show how to covert a mechanism for production-cost markets into a mechanism for
the broker, whenever the former satisfies cost-monotonicity. This reduction
holds even when buyers have general combinatorial valuation functions. When the
buyers' valuations are additive, we generalize an existing mechanism to
production-cost markets in an approximation-preserving way. We then show that
the resulting mechanism is cost-monotone and thus can be converted into an
8-approximation mechanism for two-sided markets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chen_J/0/1/0/all/0/1">Jing Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_B/0/1/0/all/0/1">Bo Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_Y/0/1/0/all/0/1">Yingkai Li</a>Toward Optimal Performance with Network Assisted TCP at Mobile Edge. (arXiv:1905.09349v1 [cs.NI])http://arxiv.org/abs/1905.09349
<p>In contrast to the classic fashion for designing distributed end-to-end (e2e)
TCP schemes for cellular networks (CN), we explore another design space by
having the CN assist the task of the transport control. We show that in the
emerging cellular architectures such as mobile/multi-access edge computing
(MEC), where the servers are located close to the radio access network (RAN),
significant improvements can be achieved by leveraging the nature of the
logically centralized network measurements at the RAN and passing information
such as its minimum e2e delay and access link capacity to each server.
Particularly, a Network Assistance module (located at the mobile edge) will
pair up with wireless scheduler to provide feedback information to each server
and facilitate the task of congestion control. To that end, we present two
Network Assisted schemes called NATCP (a clean-slate design replacing TCP at
end-hosts) and NACubic (a backward compatible design requiring no change for
TCP at end-hosts). Our preliminary evaluations using real cellular traces show
that both schemes dramatically outperform existing schemes both in single-flow
and multi-flow scenarios.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Abbasloo_S/0/1/0/all/0/1">Soheil Abbasloo</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_Y/0/1/0/all/0/1">Yang Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Chao_H/0/1/0/all/0/1">H. Jonathon Chao</a>, <a href="http://arxiv.org/find/cs/1/au:+Shi_H/0/1/0/all/0/1">Hang Shi</a>, <a href="http://arxiv.org/find/cs/1/au:+Kozat_U/0/1/0/all/0/1">Ulas C. Kozat</a>, <a href="http://arxiv.org/find/cs/1/au:+Ye_Y/0/1/0/all/0/1">Yinghua Ye</a>The tradeoff between the utility and risk of location data and implications for public good. (arXiv:1905.09350v1 [cs.CY])http://arxiv.org/abs/1905.09350
<p>High-resolution individual geolocation data passively collected from mobile
phones is increasingly sold in private markets and shared with researchers.
This data poses significant security, privacy, and ethical risks: it's been
shown that users can be re-identified in such datasets, and its collection
rarely involves their full consent or knowledge. This data is valuable to
private firms (e.g. targeted marketing) but also presents clear value as a
public good. Recent public interest research has demonstrated that
high-resolution location data can more accurately measure segregation in cities
and provide inexpensive transit modeling. But as data is aggregated to mitigate
its re-identifiability risk, its value as a good diminishes. How do we rectify
the clear security and safety risks of this data, its high market value, and
its potential as a resource for public good? We extend the recently proposed
concept of a tradeoff curve that illustrates the relationship between dataset
utility and privacy. We then hypothesize how this tradeoff differs between
private market use and its potential use for public good. We further provide
real-world examples of how high resolution location data, aggregated to varying
degrees of privacy protection, can be used in the public sphere and how it is
currently used by private firms.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Calacci_D/0/1/0/all/0/1">Dan Calacci</a>, <a href="http://arxiv.org/find/cs/1/au:+Berke_A/0/1/0/all/0/1">Alex Berke</a>, <a href="http://arxiv.org/find/cs/1/au:+Larson_K/0/1/0/all/0/1">Kent Larson</a>, <a href="http://arxiv.org/find/cs/1/au:+Alex/0/1/0/all/0/1">Alex</a> (Sandy) <a href="http://arxiv.org/find/cs/1/au:+Pentland/0/1/0/all/0/1">Pentland</a>Hey Google, What Exactly Do Your Security Patches Tell Us? A Large-Scale Empirical Study on Android Patched Vulnerabilities. (arXiv:1905.09352v1 [cs.CR])http://arxiv.org/abs/1905.09352
<p>In this paper, we perform a comprehensive study of 2,470 patched Android
vulnerabilities that we collect from different data sources such as Android
security bulletins, CVEDetails, Qualcomm Code Aurora, AOSP Git repository, and
Linux Patchwork. In our data analysis, we focus on determining the affected
layers, OS versions, severity levels, and common weakness enumerations (CWE)
associated with the patched vulnerabilities. Further, we assess the timeline of
each vulnerability, including discovery and patch dates. We find that (i) even
though the number of patched vulnerabilities changes considerably from month to
month, the relative number of patched vulnerabilities for each severity level
remains stable over time, (ii) there is a significant delay in patching
vulnerabilities that originate from the Linux community or concern Qualcomm
components, even though Linux and Qualcomm provide and release their own
patches earlier, (iii) different AOSP versions receive security updates for
different periods of time, (iv) for 94% of patched Android vulnerabilities, the
date of disclosure in public datasets is not before the patch release date, (v)
there exist some inconsistencies among public vulnerability data sources, e.g.,
some CVE IDs are listed in Android Security bulletins with detailed
information, but in CVEDetails they are listed as unknown, (vi) many patched
vulnerabilities for newer Android versions likely also affect older versions
that do not receive security patches due to end-of-life.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Farhang_S/0/1/0/all/0/1">Sadegh Farhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Kirdan_M/0/1/0/all/0/1">Mehmet Bahadir Kirdan</a>, <a href="http://arxiv.org/find/cs/1/au:+Laszka_A/0/1/0/all/0/1">Aron Laszka</a>, <a href="http://arxiv.org/find/cs/1/au:+Grossklags_J/0/1/0/all/0/1">Jens Grossklags</a>Minimizing the Negative Side Effects of Planning with Reduced Models. (arXiv:1905.09355v1 [cs.AI])http://arxiv.org/abs/1905.09355
<p>Reduced models of large Markov decision processes accelerate planning by
considering a subset of outcomes for each state-action pair. This reduction in
reachable states leads to replanning when the agent encounters states without a
precomputed action during plan execution. However, not all states are suitable
for replanning. In the worst case, the agent may not be able to reach the goal
from the newly encountered state. Agents should be better prepared to handle
such risky situations and avoid replanning in risky states. Hence, we consider
replanning in states that are unsafe for deliberation as a negative side effect
of planning with reduced models. While the negative side effects can be
minimized by always using the full model, this defeats the purpose of using
reduced models. The challenge is to plan with reduced models, but somehow
account for the possibility of encountering risky situations. An agent should
thus only replan in states that the user has approved as safe for replanning.
To that end, we propose planning using a portfolio of reduced models, a
planning paradigm that minimizes the negative side effects of planning using
reduced models by alternating between different outcome selection approaches.
We empirically demonstrate the effectiveness of our approach on three domains:
an electric vehicle charging domain using real-world data from a university
campus and two benchmark planning problems.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Saisubramanian_S/0/1/0/all/0/1">Sandhya Saisubramanian</a>, <a href="http://arxiv.org/find/cs/1/au:+Zilberstein_S/0/1/0/all/0/1">Shlomo Zilberstein</a>Convergence Analyses of Online ADAM Algorithm in Convex Setting and Two-Layer ReLU Neural Network. (arXiv:1905.09356v1 [cs.LG])http://arxiv.org/abs/1905.09356
<p>Nowadays, online learning is an appealing learning paradigm, which is of
great interest in practice due to the recent emergence of large scale
applications such as online advertising placement and online web ranking.
Standard online learning assumes a finite number of samples while in practice
data is streamed infinitely. In such a setting gradient descent with a
diminishing learning rate does not work. We first introduce regret with rolling
window, a new performance metric for online streaming learning, which measures
the performance of an algorithm on every fixed number of contiguous samples. At
the same time, we propose a family of algorithms based on gradient descent with
a constant or adaptive learning rate and provide very technical analyses
establishing regret bound properties of the algorithms. We cover the convex
setting showing the regret of the order of the square root of the size of the
window in the constant and dynamic learning rate scenarios. Our proof is
applicable also to the standard online setting where we provide the first
analysis of the same regret order (the previous proofs have flaws). We also
study a two layer neural network setting with ReLU activation. In this case we
establish that if initial weights are close to a stationary point, the same
square root regret bound is attainable. We conduct computational experiments
demonstrating a superior performance of the proposed algorithms.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Fang_B/0/1/0/all/0/1">Biyi Fang</a>, <a href="http://arxiv.org/find/cs/1/au:+Klabjan_D/0/1/0/all/0/1">Diego Klabjan</a>Towards Global Asset Management in Blockchain Systems. (arXiv:1905.09359v1 [cs.DB])http://arxiv.org/abs/1905.09359
<p>Permissionless blockchains (e.g., Bitcoin, Ethereum, etc) have shown a wide
success in implementing global scale peer-to-peer cryptocurrency systems. In
such blockchains, new currency units are generated through the mining process
and are used in addition to transaction fees to incentivize miners to maintain
the blockchain. Although it is clear how currency units are generated and
transacted on, it is unclear how to use the infrastructure of permissionless
blockchains to manage other assets than the blockchain's currency units (e.g.,
cars, houses, etc). In this paper, we propose a global asset management system
by unifying permissioned and permissionless blockchains. A governmental
permissioned blockchain authenticates the registration of end-user assets
through smart contract deployments on a permissionless blockchain. Afterwards,
end-users can transact on their assets through smart contract function calls
(e.g., sell a car, rent a room in a house, etc). In return, end-users get paid
in currency units of the same blockchain or other blockchains through atomic
cross-chain transactions and governmental offices receive taxes on these
transactions in cryptocurrency units.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zakhary_V/0/1/0/all/0/1">Victor Zakhary</a>, <a href="http://arxiv.org/find/cs/1/au:+Amiri_M/0/1/0/all/0/1">Mohammad Javad Amiri</a>, <a href="http://arxiv.org/find/cs/1/au:+Maiyya_S/0/1/0/all/0/1">Sujaya Maiyya</a>, <a href="http://arxiv.org/find/cs/1/au:+Agrawal_D/0/1/0/all/0/1">Divyakant Agrawal</a>, <a href="http://arxiv.org/find/cs/1/au:+Abbadi_A/0/1/0/all/0/1">Amr El Abbadi</a>FQL: An Extensible Feature Query Language and Toolkit on Searching Software Characteristics for HPC Applications. (arXiv:1905.09364v1 [cs.SE])http://arxiv.org/abs/1905.09364
<p>The amount of large-scale scientific computing software is dramatically
increasing. In this work, we designed a new language, named feature query
language (FQL), to collect and extract software features from a quick static
code analysis. We designed and implemented an FQL toolkit to automatically
detect and present the software features using an extensible query repository.
Several large-scale, high performance computing (HPC) scientific codes have
been used in the paper to demonstrate the HPC-related feature extraction and
information collection. Although we emphasized the HPC features in the study,
the toolkit can be easily extended to answer general software feature
questions, such as coding pattern and hardware dependency.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zheng_W/0/1/0/all/0/1">Weijian Zheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_D/0/1/0/all/0/1">Dali Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Song_F/0/1/0/all/0/1">Fengguang Song</a>Outlier Robust Extreme Learning Machine for Multi-Target Regression. (arXiv:1905.09368v1 [cs.LG])http://arxiv.org/abs/1905.09368
<p>The popularity of algorithms based on Extreme Learning Machine (ELM), which
can be used to train Single Layer Feedforward Neural Networks (SLFN), has
increased in the past years. They have been successfully applied to a wide
range of classification and regression tasks. The most commonly used methods
are the ones based on minimizing the $\ell_2$ norm of the error, which is not
suitable to deal with outliers, essentially in regression tasks. The use of
$\ell_1$ norm was proposed in Outlier Robust ELM (OR-ELM), which is defined to
one-dimensional outputs. In this paper, we generalize OR-ELM to deal with
multi-target regression problems, using the error $\ell_{2,1}$ norm and the
Elastic Net theory, which can result in a more sparse network, resulting in our
method, Generalized Outlier Robust ELM (GOR-ELM). We use Alternating Direction
Method of Multipliers (ADMM) to solve the resulting optimization problem. An
incremental version of GOR-ELM is also proposed. We chose 15 public real-world
multi-target regression datasets to test our methods. Our conducted experiments
show that they are statistically better than other ELM-based techniques, when
considering data contaminated with outliers, and equivalent to them, otherwise.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Silva_B/0/1/0/all/0/1">Bruno L&#xe9;gora Souza da Silva</a>, <a href="http://arxiv.org/find/cs/1/au:+Inaba_F/0/1/0/all/0/1">Fernando Kentaro Inaba</a>, <a href="http://arxiv.org/find/cs/1/au:+Salles_E/0/1/0/all/0/1">Evandro Ottoni Teatini Salles</a>, <a href="http://arxiv.org/find/cs/1/au:+Ciarelli_P/0/1/0/all/0/1">Patrick Marques Ciarelli</a>Sparse Equisigned PCA: Algorithms and Performance Bounds in the Noisy Rank-1 Setting. (arXiv:1905.09369v1 [math.ST])http://arxiv.org/abs/1905.09369
<p>Singular value decomposition (SVD) based principal component analysis (PCA)
breaks down in the high-dimensional and limited sample size regime below a
certain critical eigen-SNR that depends on the dimensionality of the system and
the number of samples. Below this critical eigen-SNR, the estimates returned by
the SVD are asymptotically uncorrelated with the latent principal components.
We consider a setting where the left singular vector of the underlying rank one
signal matrix is assumed to be sparse and the right singular vector is assumed
to be equisigned, that is, having either only nonnegative or only nonpositive
entries. We consider six different algorithms for estimating the sparse
principal component based on different statistical criteria and prove that by
exploiting sparsity, we recover consistent estimates in the low eigen-SNR
regime where the SVD fails. Our analysis reveals conditions under which a
coordinate selection scheme based on a \textit{sum-type decision statistic}
outperforms schemes that utilize the $\ell_1$ and $\ell_2$ norm-based
statistics. We derive lower bounds on the size of detectable coordinates of the
principal left singular vector and utilize these lower bounds to derive lower
bounds on the worst-case risk. Finally, we verify our findings with numerical
simulations and illustrate the performance with a video data example, where the
interest is in identifying objects.
</p>
<a href="http://arxiv.org/find/math/1/au:+Prasadan_A/0/1/0/all/0/1">Arvind Prasadan</a>, <a href="http://arxiv.org/find/math/1/au:+Nadakuditi_R/0/1/0/all/0/1">Raj Rao Nadakuditi</a>, <a href="http://arxiv.org/find/math/1/au:+Paul_D/0/1/0/all/0/1">Debashis Paul</a>Lexicase Selection of Specialists. (arXiv:1905.09372v1 [cs.NE])http://arxiv.org/abs/1905.09372
<p>Lexicase parent selection filters the population by considering one random
training case at a time, eliminating any individuals with errors for the
current case that are worse than the best error in the selection pool, until a
single individual remains. This process often stops before considering all
training cases, meaning that it will ignore the error values on any cases that
were not yet considered. Lexicase selection can therefore select specialist
individuals that have poor errors on some training cases, if they have great
errors on others and those errors come near the start of the random list of
cases used for the parent selection event in question. We hypothesize here that
selecting these specialists, which may have poor total error, plays an
important role in lexicase selection's observed performance advantages over
error-aggregating parent selection methods such as tournament selection, which
select specialists much less frequently. We conduct experiments examining this
hypothesis, and find that lexicase selection's performance and diversity
maintenance degrade when we deprive it of the ability of selecting specialists.
These findings help explain the improved performance of lexicase selection
compared to tournament selection, and suggest that specialists help drive
evolution under lexicase selection toward global solutions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Helmuth_T/0/1/0/all/0/1">Thomas Helmuth</a>, <a href="http://arxiv.org/find/cs/1/au:+Pantridge_E/0/1/0/all/0/1">Edward Pantridge</a>, <a href="http://arxiv.org/find/cs/1/au:+Spector_L/0/1/0/all/0/1">Lee Spector</a>Comparing and Combining Lexicase Selection and Novelty Search. (arXiv:1905.09374v1 [cs.NE])http://arxiv.org/abs/1905.09374
<p>Lexicase selection and novelty search, two parent selection methods used in
evolutionary computation, emphasize exploring widely in the search space more
than traditional methods such as tournament selection. However, lexicase
selection is not explicitly driven to select for novelty in the population, and
novelty search suffers from lack of direction toward a goal, especially in
unconstrained, highly-dimensional spaces. We combine the strengths of lexicase
selection and novelty search by creating a novelty score for each test case,
and adding those novelty scores to the normal error values used in lexicase
selection. We use this new novelty-lexicase selection to solve automatic
program synthesis problems, and find it significantly outperforms both novelty
search and lexicase selection. Additionally, we find that novelty search has
very little success in the problem domain of program synthesis. We explore the
effects of each of these methods on population diversity and long-term problem
solving performance, and give evidence to support the hypothesis that
novelty-lexicase selection resists converging to local optima better than
lexicase selection.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Jundt_L/0/1/0/all/0/1">Lia Jundt</a>, <a href="http://arxiv.org/find/cs/1/au:+Helmuth_T/0/1/0/all/0/1">Thomas Helmuth</a>Critical Review of BugSwarm for Fault Localization and Program Repair. (arXiv:1905.09375v1 [cs.SE])http://arxiv.org/abs/1905.09375
<p>Benchmarks play an important role in evaluating the efficiency and
effectiveness of solutions to automate several phases of the software
development lifecycle. Moreover, if well designed, they also serve us well as
an important artifact to compare different approaches amongst themselves.
BugSwarm is a benchmark that has been recently published, which contains 3,091
pairs of failing and passing continuous integration builds. According to the
authors, the benchmark has been designed with the automatic program repair and
fault localization communities in mind. Given that a benchmark targeting these
communities ought to have several characteristics (e.g., a buggy statement
needs to be present), we have dissected the benchmark to fully understand
whether the benchmark suits these communities well. Our critical analysis has
found several limitations in the benchmark: only 112/3,091 (3.6%) are suitable
to evaluate techniques for automatic fault localization or program repair.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Durieux_T/0/1/0/all/0/1">Thomas Durieux</a>, <a href="http://arxiv.org/find/cs/1/au:+Abreu_R/0/1/0/all/0/1">Rui Abreu</a>Learning to Prove Theorems via Interacting with Proof Assistants. (arXiv:1905.09381v1 [cs.LO])http://arxiv.org/abs/1905.09381
<p>Humans prove theorems by relying on substantial high-level reasoning and
problem-specific insights. Proof assistants offer a formalism that resembles
human mathematical reasoning, representing theorems in higher-order logic and
proofs as high-level tactics. However, human experts have to construct proofs
manually by entering tactics into the proof assistant. In this paper, we study
the problem of using machine learning to automate the interaction with proof
assistants. We construct CoqGym, a large-scale dataset and learning environment
containing 71K human-written proofs from 123 projects developed with the Coq
proof assistant. We develop ASTactic, a deep learning-based model that
generates tactics as programs in the form of abstract syntax trees (ASTs).
Experiments show that ASTactic trained on CoqGym can generate effective tactics
and can be used to prove new theorems not previously provable by automated
methods. Code is available at https://github.com/princeton-vl/CoqGym.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Yang_K/0/1/0/all/0/1">Kaiyu Yang</a>, <a href="http://arxiv.org/find/cs/1/au:+Deng_J/0/1/0/all/0/1">Jia Deng</a>An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule. (arXiv:1905.09383v1 [stat.ML])http://arxiv.org/abs/1905.09383
<p>We present a provably optimal differentially private algorithm for the
stochastic multi-arm bandit problem, as opposed to the private analogue of the
UCB-algorithm [Mishra and Thakurta, 2015; Tossou and Dimitrakakis, 2016] which
doesn't meet the recently discovered lower-bound of $\Omega
\left(\frac{K\log(T)}{\epsilon} \right)$ [Shariff and Sheffet, 2018]. Our
construction is based on a different algorithm, Successive Elimination
[Even-Dar et al. 2002], that repeatedly pulls all remaining arms until an arm
is found to be suboptimal and is then eliminated. In order to devise a private
analogue of Successive Elimination we visit the problem of private stopping
rule, that takes as input a stream of i.i.d samples from an unknown
distribution and returns a multiplicative $(1 \pm \alpha)$-approximation of the
distribution's mean, and prove the optimality of our private stopping rule. We
then present the private Successive Elimination algorithm which meets both the
non-private lower bound [Lai and Robbins, 1985] and the above-mentioned private
lower bound. We also compare empirically the performance of our algorithm with
the private UCB algorithm.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Sajed_T/0/1/0/all/0/1">Touqir Sajed</a>, <a href="http://arxiv.org/find/stat/1/au:+Sheffet_O/0/1/0/all/0/1">Or Sheffet</a>A new secure multi-hop untrusted relaying scheme. (arXiv:1905.09384v1 [eess.SP])http://arxiv.org/abs/1905.09384
<p>Cooperative relaying is utilized as an efficient method for data
communication in wireless sensor networks and Internet of Things (IoT).
However, sometimes due to the necessity of multi-hop relaying in such
communication networks, it is challenging to guarantee the secrecy of
cooperative transmissions when the relays may themselves be eavesdroppers,
i.e., we may face with the untrusted relaying scenario where the relays are
both necessary helpers and potential eavesdroppers. To obviate this issue, a
new cooperative jamming scheme is proposed in this paper, in which the data can
be confidentially communicated from the source to the destination through
multi-hop untrusted relays. Toward this end, we first consider a two successive
untrusted relaying network, i.e, a three-hop communication network. In our
proposed secure transmission scheme, all the legitimate nodes contribute to
provide secure communication by smartly injecting artificial noises to the
network in different communication phases. Given this system model, a novel
closed-form expression is presented in the high signal-to-noise ratio (SNR)
regime for the ergodic secrecy rate (ESR). Furthermore, we evaluate the high
SNR slope and power offset of the ESR to gain a basic comparison of the
proposed secure transmission scheme and the state-of-arts. Our numerical
results highlight that the proposed secure transmission scheme provides better
secrecy rate compared with the two-hop untrusted relaying scheme as well as the
direct transmission scheme.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Kuhestani_A/0/1/0/all/0/1">Ali Kuhestani</a>, <a href="http://arxiv.org/find/eess/1/au:+Mamaghani_M/0/1/0/all/0/1">Milad Tatar Mamaghani</a>, <a href="http://arxiv.org/find/eess/1/au:+Behroozi_H/0/1/0/all/0/1">Hamid Behroozi</a>A Sub-mm Ultrasonic Free-floating Implant for Multi-mote Neural Recording. (arXiv:1905.09386v1 [eess.SP])http://arxiv.org/abs/1905.09386
<p>A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording
implant is presented. The device is comprised only of a 0.25 mm$^2$ recording
IC and a single piezoceramic resonator that is used for both power harvesting
and data transmission. Uplink data transmission is performed by amplitude
modulation of the ultrasound echo. A technique to linearize the echo amplitude
is introduced, resulting in &lt;1.2% static nonlinearity of the received signal
over a $\pm$10 mV input range. The IC dissipates 37.7 $\mu$W, while the neural
recording front-end consumes 4 $\mu$W and achieves a noise floor of 5.3
$\mu$Vrms in a 5 kHz bandwidth. This work improves sub-mm recording mote depth
by &gt;2.5x, resulting in the highest measured depth/volume ratio by ~3x.
Orthogonal subcarrier modulation enables simultaneous operation of multiple
implants, using a single-element ultrasound external transducer. Dual-mote
simultaneous power up and data transmission is demonstrated at a rate of 4.7
kS/s at the depth of 45 mm.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Ghanbari_M/0/1/0/all/0/1">Mohammad Meraj Ghanbari</a>, <a href="http://arxiv.org/find/eess/1/au:+Piech_D/0/1/0/all/0/1">David K. Piech</a>, <a href="http://arxiv.org/find/eess/1/au:+Shen_K/0/1/0/all/0/1">Konlin Shen</a>, <a href="http://arxiv.org/find/eess/1/au:+Alamouti_S/0/1/0/all/0/1">Sina Faraji Alamouti</a>, <a href="http://arxiv.org/find/eess/1/au:+Yalcin_C/0/1/0/all/0/1">Cem Yalcin</a>, <a href="http://arxiv.org/find/eess/1/au:+Johnson_B/0/1/0/all/0/1">Benjamin C. Johnson</a>, <a href="http://arxiv.org/find/eess/1/au:+Carmena_J/0/1/0/all/0/1">Jose M. Carmena</a>, <a href="http://arxiv.org/find/eess/1/au:+Maharbiz_M/0/1/0/all/0/1">Michel M. Maharbiz</a>, <a href="http://arxiv.org/find/eess/1/au:+Muller_R/0/1/0/all/0/1">Rikky Muller</a>Robust Wireless Fingerprinting via Complex-Valued Neural Networks. (arXiv:1905.09388v1 [eess.SP])http://arxiv.org/abs/1905.09388
<p>A "wireless fingerprint" which exploits hardware imperfections unique to each
device is a potentially powerful tool for wireless security. Such a fingerprint
should be able to distinguish between devices sending the same message, and
should be robust against standard spoofing techniques. Since the information in
wireless signals resides in complex baseband, in this paper, we explore the use
of neural networks with complex-valued weights to learn fingerprints using
supervised learning. We demonstrate that, while there are potential benefits to
using sections of the signal beyond just the preamble to learn fingerprints,
the network cheats when it can, using information such as transmitter ID (which
can be easily spoofed) to artificially inflate performance. We also show that
noise augmentation by inserting additional white Gaussian noise can lead to
significant performance gains, which indicates that this counter-intuitive
strategy helps in learning more robust fingerprints. We provide results for two
different wireless protocols, WiFi and ADS-B, demonstrating the effectiveness
of the proposed method.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Gopalakrishnan_S/0/1/0/all/0/1">Soorya Gopalakrishnan</a>, <a href="http://arxiv.org/find/eess/1/au:+Cekic_M/0/1/0/all/0/1">Metehan Cekic</a>, <a href="http://arxiv.org/find/eess/1/au:+Madhow_U/0/1/0/all/0/1">Upamanyu Madhow</a>The Stabilized Explicit Variable-Load Solver with Machine Learning Acceleration for the Rapid Solution of Stiff Chemical Kinetics. (arXiv:1905.09395v1 [physics.comp-ph])http://arxiv.org/abs/1905.09395
<p>Numerical solutions to differential equations are at the core of
computational fluid dynamics calculations. As the size and complexity of the
simulations grow, so does the need for computational power and time. As the
size and complexity of the simulations grow, so does the need for computational
power and time. Solving the equations in parallel can dramatically reduce the
time to solution. While traditionally done on CPU, unlocking the massive number
of computational cores on GPU is highly desirable. Many efforts have been made
to implement stiff chemistry solvers on GPUs but have not been highly
successful because of the logical divergence in traditional stiff algorithms
like CVODE or LSODE. This study will demonstrate a machine learned hybrid
algorithm implemented in TensorFlow for stiff problems and the speed gains
relative to the traditional LSODE solver used in the Multiphase Flow with
Interphase eXchanges (MFiX) Computational Fluid Dynamics (CFD) code. The
results will show a dramatic decrease in total simulation time while
maintaining the same degree of accuracy.
</p>
<a href="http://arxiv.org/find/physics/1/au:+Buchheit_K/0/1/0/all/0/1">Kyle Buchheit</a>, <a href="http://arxiv.org/find/physics/1/au:+Owoyele_O/0/1/0/all/0/1">Opeoluwa Owoyele</a>, <a href="http://arxiv.org/find/physics/1/au:+Jordan_T/0/1/0/all/0/1">Terry Jordan</a>, <a href="http://arxiv.org/find/physics/1/au:+Essendelft_D/0/1/0/all/0/1">Dirk Van Essendelft</a>Predictive Control for Chasing a Ground Vehicle using a UAV. (arXiv:1905.09396v1 [cs.RO])http://arxiv.org/abs/1905.09396
<p>We propose a high-level planner for a multirotor to chase a ground vehicle,
while simultaneously respecting various state and input constraints. Assuming a
minimal kinematic model for the ground vehicle, we use data collected online to
generate predictions for our planner within a model predictive control
framework. Our solution is demonstrated, both via simulations and experiments
on a stable quadcopter platform.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Byun_J/0/1/0/all/0/1">Jaeseung Byun</a>, <a href="http://arxiv.org/find/cs/1/au:+Jain_K/0/1/0/all/0/1">Karan P. Jain</a>, <a href="http://arxiv.org/find/cs/1/au:+Nair_S/0/1/0/all/0/1">Siddharth H. Nair</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_H/0/1/0/all/0/1">Haoyun Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Zha_J/0/1/0/all/0/1">Jiaming Zha</a>Cognitive Model Priors for Predicting Human Decisions. (arXiv:1905.09397v1 [cs.LG])http://arxiv.org/abs/1905.09397
<p>Human decision-making underlies all economic behavior. For the past four
decades, human decision-making under uncertainty has continued to be explained
by theoretical models based on prospect theory, a framework that was awarded
the Nobel Prize in Economic Sciences. However, theoretical models of this kind
have developed slowly, and robust, high-precision predictive models of human
decisions remain a challenge. While machine learning is a natural candidate for
solving these problems, it is currently unclear to what extent it can improve
predictions obtained by current theories. We argue that this is mainly due to
data scarcity, since noisy human behavior requires massive sample sizes to be
accurately captured by off-the-shelf machine learning methods. To solve this
problem, what is needed are machine learning models with appropriate inductive
biases for capturing human behavior, and larger datasets. We offer two
contributions towards this end: first, we construct "cognitive model priors" by
pretraining neural networks with synthetic data generated by cognitive models
(i.e., theoretical models developed by cognitive psychologists). We find that
fine-tuning these networks on small datasets of real human decisions results in
unprecedented state-of-the-art improvements on two benchmark datasets. Second,
we present the first large-scale dataset for human decision-making, containing
over 240,000 human judgments across over 13,000 decision problems. This dataset
reveals the circumstances where cognitive model priors are useful, and provides
a new standard for benchmarking prediction of human decisions under
uncertainty.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Bourgin_D/0/1/0/all/0/1">David D. Bourgin</a>, <a href="http://arxiv.org/find/cs/1/au:+Peterson_J/0/1/0/all/0/1">Joshua C. Peterson</a>, <a href="http://arxiv.org/find/cs/1/au:+Reichman_D/0/1/0/all/0/1">Daniel Reichman</a>, <a href="http://arxiv.org/find/cs/1/au:+Griffiths_T/0/1/0/all/0/1">Thomas L. Griffiths</a>, <a href="http://arxiv.org/find/cs/1/au:+Russell_S/0/1/0/all/0/1">Stuart J. Russell</a>AttentionRNN: A Structured Spatial Attention Mechanism. (arXiv:1905.09400v1 [cs.CV])http://arxiv.org/abs/1905.09400
<p>Visual attention mechanisms have proven to be integrally important
constituent components of many modern deep neural architectures. They provide
an efficient and effective way to utilize visual information selectively, which
has shown to be especially valuable in multi-modal learning tasks. However, all
prior attention frameworks lack the ability to explicitly model structural
dependencies among attention variables, making it difficult to predict
consistent attention masks. In this paper we develop a novel structured spatial
attention mechanism which is end-to-end trainable and can be integrated with
any feed-forward convolutional neural network. This proposed AttentionRNN layer
explicitly enforces structure over the spatial attention variables by
sequentially predicting attention values in the spatial mask in a
bi-directional raster-scan and inverse raster-scan order. As a result, each
attention value depends not only on local image or contextual information, but
also on the previously predicted attention values. Our experiments show
consistent quantitative and qualitative improvements on a variety of
recognition tasks and datasets; including image categorization, question
answering and image generation.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Khandelwal_S/0/1/0/all/0/1">Siddhesh Khandelwal</a>, <a href="http://arxiv.org/find/cs/1/au:+Sigal_L/0/1/0/all/0/1">Leonid Sigal</a>Optimum Low-Complexity Decoder for Spatial Modulation. (arXiv:1905.09401v1 [cs.IT])http://arxiv.org/abs/1905.09401
<p>In this paper, a novel low-complexity detection algorithm for spatial
modulation (SM), referred to as the minimum-distance of maximum-length (m-M)
algorithm, is proposed and analyzed. The proposed m-M algorithm is a smart
searching method that is applied for the SM tree-search decoders. The behavior
of the m-M algorithm is studied for three different scenarios: i) perfect
channel state information at the receiver side (CSIR), ii) imperfect CSIR of a
fixed channel estimation error variance, and iii) imperfect CSIR of a variable
channel estimation error variance. Moreover, the complexity of the m-M
algorithm is considered as a random variable, which is carefully analyzed for
all scenarios, using probabilistic tools. Based on a combination of the sphere
decoder (SD) and ordering concepts, the m-M algorithm guarantees to find the
maximum-likelihood (ML) solution with a significant reduction in the decoding
complexity compared to SM-ML and existing SM-SD algorithms; it can reduce the
complexity up to 94% and 85% in the perfect CSIR and the worst scenario of
imperfect CSIR, respectively, compared to the SM-ML decoder. Monte Carlo
simulation results are provided to support our findings as well as the derived
analytical complexity reduction expressions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Al_Nahhal_I/0/1/0/all/0/1">Ibrahim Al-Nahhal</a>, <a href="http://arxiv.org/find/cs/1/au:+Basar_E/0/1/0/all/0/1">Ertugrul Basar</a>, <a href="http://arxiv.org/find/cs/1/au:+Dobre_O/0/1/0/all/0/1">Octavia A. Dobre</a>, <a href="http://arxiv.org/find/cs/1/au:+Ikki_S/0/1/0/all/0/1">Salama Ikki</a>Detecting Events of Daily Living Using Multimodal Data. (arXiv:1905.09402v1 [cs.HC])http://arxiv.org/abs/1905.09402
<p>Events are fundamental for understanding how people experience their lives.
It is challenging, however, to automatically record all events in daily life.
An understanding of multimedia signals allows recognizing events of daily
living and getting their attributes as automatically as possible. In this
paper, we consider the problem of recognizing a daily event by employing the
commonly used multimedia data obtained from a smartphone and wearable device.
We develop an unobtrusive approach to obtain latent semantic information from
the data, and therefore an approach for daily event recognition based on
semantic context enrichment. We represent the enrichment process through an
event knowledge graph that semantically enriches a daily event from a low-level
daily activity. To show a concrete example of this enrichment, we perform an
experiment with eating activity, which may be one of the most complex events,
by using 14 months of data for three users. In this process, to unobtrusively
complement the lack of semantic information, we suggest a new food
recognition/classification method that focuses only on a physical response to
food consumption. Experimental results indicate that our approach is able to
show automatic abstraction of life experience. These daily events can then be
used to create a personal model that can capture how a person reacts to
different stimuli under specific conditions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Oh_H/0/1/0/all/0/1">Hyungik Oh</a>, <a href="http://arxiv.org/find/cs/1/au:+Jain_R/0/1/0/all/0/1">Ramesh Jain</a>Quantifying Long Range Dependence in Language and User Behavior to improve RNNs. (arXiv:1905.09414v1 [cs.LG])http://arxiv.org/abs/1905.09414
<p>Characterizing temporal dependence patterns is a critical step in
understanding the statistical properties of sequential data. Long Range
Dependence (LRD) --- referring to long-range correlations decaying as a power
law rather than exponentially w.r.t. distance --- demands a different set of
tools for modeling the underlying dynamics of the sequential data. While it has
been widely conjectured that LRD is present in language modeling and sequential
recommendation, the amount of LRD in the corresponding sequential datasets has
not yet been quantified in a scalable and model-independent manner. We propose
a principled estimation procedure of LRD in sequential datasets based on
established LRD theory for real-valued time series and apply it to sequences of
symbols with million-item-scale dictionaries. In our measurements, the
procedure estimates reliably the LRD in the behavior of users as they write
Wikipedia articles and as they interact with YouTube. We further show that
measuring LRD better informs modeling decisions in particular for RNNs whose
ability to capture LRD is still an active area of research. The quantitative
measure informs new Evolutive Recurrent Neural Networks (EvolutiveRNNs)
designs, leading to state-of-the-art results on language understanding and
sequential recommendation tasks at a fraction of the computational cost.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Belletti_F/0/1/0/all/0/1">Francois Belletti</a>, <a href="http://arxiv.org/find/cs/1/au:+Chen_M/0/1/0/all/0/1">Minmin Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Chi_E/0/1/0/all/0/1">Ed H. Chi</a>Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned. (arXiv:1905.09418v1 [cs.CL])http://arxiv.org/abs/1905.09418
<p>Multi-head self-attention is a key component of the Transformer, a
state-of-the-art architecture for neural machine translation. In this work we
evaluate the contribution made by individual attention heads in the encoder to
the overall performance of the model and analyze the roles played by them. We
find that the most important and confident heads play consistent and often
linguistically-interpretable roles. When pruning heads using a method based on
stochastic gates and a differentiable relaxation of the L0 penalty, we observe
that specialized heads are last to be pruned. Our novel pruning method removes
the vast majority of heads without seriously affecting performance. For
example, on the English-Russian WMT dataset, pruning 38 out of 48 encoder heads
results in a drop of only 0.15 BLEU.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Voita_E/0/1/0/all/0/1">Elena Voita</a>, <a href="http://arxiv.org/find/cs/1/au:+Talbot_D/0/1/0/all/0/1">David Talbot</a>, <a href="http://arxiv.org/find/cs/1/au:+Moiseev_F/0/1/0/all/0/1">Fedor Moiseev</a>, <a href="http://arxiv.org/find/cs/1/au:+Sennrich_R/0/1/0/all/0/1">Rico Sennrich</a>, <a href="http://arxiv.org/find/cs/1/au:+Titov_I/0/1/0/all/0/1">Ivan Titov</a>Effect of shapes of activation functions on predictability in the echo state network. (arXiv:1905.09419v1 [cs.NE])http://arxiv.org/abs/1905.09419
<p>We investigate prediction accuracy for time series of Echo state networks
with respect to several kinds of activation functions. As a result, we found
that some kinds of activation functions with an appropriate nonlinearity show
high performance compared to the conventional sigmoid function.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chang_H/0/1/0/all/0/1">Hanten Chang</a>, <a href="http://arxiv.org/find/cs/1/au:+Nakaoka_S/0/1/0/all/0/1">Shinji Nakaoka</a>, <a href="http://arxiv.org/find/cs/1/au:+Ando_H/0/1/0/all/0/1">Hiroyasu Ando</a>Elliptical Perturbations for Differential Privacy. (arXiv:1905.09420v1 [cs.CR])http://arxiv.org/abs/1905.09420
<p>We study elliptical distributions in locally convex vector spaces, and
determine conditions when they can or cannot be used to satisfy differential
privacy (DP). A requisite condition for a sanitized statistical summary to
satisfy DP is that the corresponding privacy mechanism must induce equivalent
measures for all possible input databases. We show that elliptical
distributions with the same dispersion operator, $C$, are equivalent if the
difference of their means lies in the Cameron-Martin space of $C$. In the case
of releasing finite-dimensional projections using elliptical perturbations, we
show that the privacy parameter $\ep$ can be computed in terms of a
one-dimensional maximization problem. We apply this result to consider
multivariate Laplace, $t$, Gaussian, and $K$-norm noise. Surprisingly, we show
that the multivariate Laplace noise does not achieve $\ep$-DP in any dimension
greater than one. Finally, we show that when the dimension of the space is
infinite, no elliptical distribution can be used to give $\ep$-DP; only
$(\epsilon,\delta)$-DP is possible.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Reimherr_M/0/1/0/all/0/1">Matthew Reimherr</a>, <a href="http://arxiv.org/find/cs/1/au:+Awan_J/0/1/0/all/0/1">Jordan Awan</a>Set Constraints, Pattern Match Analysis, and SMT. (arXiv:1905.09423v1 [cs.PL])http://arxiv.org/abs/1905.09423
<p>Set constraints provide a highly general way to formulate program analyses.
However, solving arbitrary boolean combinations of set constraints is
NEXPTIME-complete. Moreover, while theoretical algorithms to solve arbitrary
set constraints exist, they are either too complex to implement, or too slow to
ever run.
</p>
<p>We present a translation that converts a set constraint formula into an SMT
problem. Our technique allows for arbitrary boolean combinations of positive or
negative set constraints, and leverages the performance of modern solvers such
as Z3. To show the usefulness of unrestricted set constraints, we use them to
devise a pattern match analysis for functional languages. This analysis ensures
that missing cases of pattern matches are always unreachable. We implement our
analysis in the Elm compiler and show that the our translation is fast enough
to be used in practical verification.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Eremondi_J/0/1/0/all/0/1">Joseph Eremondi</a>Bounding the State Covariance Matrix for a Randomly Switching Linear System with Noise. (arXiv:1905.09427v1 [math.DS])http://arxiv.org/abs/1905.09427
<p>The propagation of a state vector is governed by a set of time-invariant
state transition matrices that switch arbitrarily between two values. The
evolution of the state is also perturbed by white Gaussian noise with a
variance that switches randomly with the state transition relation. The
behavior of this system can be characterized by the covariance matrix of the
state vector, which is time varying. However, we can bound the set of
covariances by comparing the switching system to an augmented system derived
with Kronecker algebra. We formulate a matrix optimization problem to compute
an ellipsoid that bounds the covariance dynamics, which in turn bounds the
state covariance of the set of switching systems subject to white noise. In
developing this approach, an invariant ellipsoid for a linear switching affine
system is computed along the way.
</p>
<a href="http://arxiv.org/find/math/1/au:+Yoon_Y/0/1/0/all/0/1">Yongeun Yoon</a>, <a href="http://arxiv.org/find/math/1/au:+Klett_C/0/1/0/all/0/1">Corbin Klett</a>, <a href="http://arxiv.org/find/math/1/au:+Feron_E/0/1/0/all/0/1">Eric Feron</a>Learning Discrete and Continuous Factors of Data via Alternating Disentanglement. (arXiv:1905.09432v1 [cs.LG])http://arxiv.org/abs/1905.09432
<p>We address the problem of unsupervised disentanglement of discrete and
continuous explanatory factors of data. We first show a simple procedure for
minimizing the total correlation of the continuous latent variables without
having to use a discriminator network or perform importance sampling, via
cascading the information flow in the $\beta$-vae framework. Furthermore, we
propose a method which avoids offloading the entire burden of jointly modeling
the continuous and discrete factors to the variational encoder by employing a
separate discrete inference procedure.
</p>
<p>This leads to an interesting alternating minimization problem which switches
between finding the most likely discrete configuration given the continuous
factors and updating the variational encoder based on the computed discrete
factors. Experiments show that the proposed method clearly disentangles
discrete factors and significantly outperforms current disentanglement methods
based on the disentanglement score and inference network classification score.
The source code is available at
https://github.com/snu-mllab/DisentanglementICML19.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Jeong_Y/0/1/0/all/0/1">Yeonwoo Jeong</a>, <a href="http://arxiv.org/find/cs/1/au:+Song_H/0/1/0/all/0/1">Hyun Oh Song</a>FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction. (arXiv:1905.09433v1 [cs.LG])http://arxiv.org/abs/1905.09433
<p>Advertising and feed ranking are essential to many Internet companies such as
Facebook and Sina Weibo. Among many real-world advertising and feed ranking
systems, click through rate (CTR) prediction plays a central role. There are
many proposed models in this field such as logistic regression, tree based
models, factorization machine based models and deep learning based CTR models.
However, many current works calculate the feature interactions in a simple way
such as Hadamard product and inner product and they care less about the
importance of features. In this paper, a new model named FiBiNET as an
abbreviation for Feature Importance and Bilinear feature Interaction NETwork is
proposed to dynamically learn the feature importance and fine-grained feature
interactions. On the one hand, the FiBiNET can dynamically learn the importance
of features via the Squeeze-Excitation network (SENET) mechanism; on the other
hand, it is able to effectively learn the feature interactions via bilinear
function. We conduct extensive experiments on two real-world datasets and show
that our shallow model outperforms other shallow models such as factorization
machine(FM) and field-aware factorization machine(FFM). In order to improve
performance further, we combine a classical deep neural network(DNN) component
with the shallow model to be a deep model. The deep FiBiNET consistently
outperforms the other state-of-the-art deep models such as DeepFM and extreme
deep factorization machine(XdeepFM).
</p>
<a href="http://arxiv.org/find/cs/1/au:+Huang_T/0/1/0/all/0/1">Tongwen Huang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Z/0/1/0/all/0/1">Zhiqi Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_J/0/1/0/all/0/1">Junlin Zhang</a>Automated Process Planning for Turning: A Feature-Free Approach. (arXiv:1905.09434v1 [cs.CG])http://arxiv.org/abs/1905.09434
<p>Turning is the most commonly available and least expensive machining
operation, in terms of both machine-hour rates and tool insert prices. A
practical CNC process planner has to maximize the utilization of turning, not
only to attain precision requirements for turnable surfaces, but also to
minimize the machining cost, while non-turnable features can be left for other
processes such as milling. Most existing methods rely on separation of surface
features and lack guarantees when analyzing complex parts with interacting
features. In a previous study, we demonstrated successful implementation of a
feature-free milling process planner based on configuration space methods used
for spatial reasoning and AI search for planning. This paper extends the
feature-free method to include turning process planning. It opens up the
opportunity for seamless integration of turning actions into a mill-turn
process planner that can handle arbitrarily complex shapes with or without a
priori knowledge of feature semantics.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Behandish_M/0/1/0/all/0/1">Morad Behandish</a>, <a href="http://arxiv.org/find/cs/1/au:+Nelaturi_S/0/1/0/all/0/1">Saigopal Nelaturi</a>, <a href="http://arxiv.org/find/cs/1/au:+Verma_C/0/1/0/all/0/1">Chaman Singh Verma</a>, <a href="http://arxiv.org/find/cs/1/au:+Allard_M/0/1/0/all/0/1">Mats Allard</a>MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling. (arXiv:1905.09435v1 [cs.LG])http://arxiv.org/abs/1905.09435
<p>The trade-off between convergence error and communication delays in
decentralized stochastic gradient descent~(SGD) is dictated by the sparsity of
the inter-worker communication graph. In this paper, we propose MATCHA, a
decentralized SGD method where we use matching decomposition sampling of the
base graph to parallelize inter-worker information exchange so as to
significantly reduce communication delay. At the same time, under standard
assumptions for any general topology, in spite of the significant reduction of
the communication delay, MATCHA maintains the same convergence rate as that of
the state-of-the-art in terms of epochs. Experiments on a suite of datasets and
deep neural networks validate the theoretical analysis and demonstrate the
effectiveness of the proposed scheme as far as reducing communication delays is
concerned.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Wang_J/0/1/0/all/0/1">Jianyu Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Sahu_A/0/1/0/all/0/1">Anit Kumar Sahu</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_Z/0/1/0/all/0/1">Zhouyi Yang</a>, <a href="http://arxiv.org/find/cs/1/au:+Joshi_G/0/1/0/all/0/1">Gauri Joshi</a>, <a href="http://arxiv.org/find/cs/1/au:+Kar_S/0/1/0/all/0/1">Soummya Kar</a>KNG: The K-Norm Gradient Mechanism. (arXiv:1905.09436v1 [cs.CR])http://arxiv.org/abs/1905.09436
<p>This paper presents a new mechanism for producing sanitized statistical
summaries that achieve \emph{differential privacy}, called the \emph{K-Norm
Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility
of the exponential mechanism, while achieving the powerful utility performance
of objective perturbation. KNG starts with an inherent objective function
(often an empirical risk), and promotes summaries that are close to minimizing
the objective by weighting according to how far the gradient of the objective
function is from zero. Working with the gradient instead of the original
objective function allows for additional flexibility as one can penalize using
different norms. We show that, unlike the exponential mechanism, the noise
added by KNG is asymptotically negligible compared to the statistical error for
many problems. In addition to theoretical guarantees on privacy and utility, we
confirm the utility of KNG empirically in the settings of linear and quantile
regression through simulations.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Reimherr_M/0/1/0/all/0/1">Matthew Reimherr</a>, <a href="http://arxiv.org/find/cs/1/au:+Awan_J/0/1/0/all/0/1">Jordan Awan</a>Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document Traversal. (arXiv:1905.09438v1 [cs.CL])http://arxiv.org/abs/1905.09438
<p>Reading Comprehension has received significant attention in recent years as
high quality Question Answering (QA) datasets have become available. Despite
state-of-the-art methods achieving strong overall accuracy, Multi-Hop (MH)
reasoning remains particularly challenging. To address MH-QA specifically, we
propose a Deep Reinforcement Learning based method capable of learning
sequential reasoning across large collections of documents so as to pass a
query-aware, fixed-size context subset to existing models for answer
extraction. Our method is comprised of two stages: a linker, which decomposes
the provided support documents into a graph of sentences, and an extractor,
which learns where to look based on the current question and already-visited
sentences. The result of the linker is a novel graph structure at the sentence
level that preserves logical flow while still allowing rapid movement between
documents. Importantly, we demonstrate that the sparsity of the resultant graph
is invariant to context size. This translates to fewer decisions required from
the Deep-RL trained extractor, allowing the system to scale effectively to
large collections of documents.
</p>
<p>The importance of sequential decision making in the document traversal step
is demonstrated by comparison to standard IE methods, and we additionally
introduce a BM25-based IR baseline that retrieves documents relevant to the
query only. We examine the integration of our method with existing models on
the recently proposed QAngaroo benchmark and achieve consistent increases in
accuracy across the board, as well as a 2-3x reduction in training time.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Long_A/0/1/0/all/0/1">Alex Long</a>, <a href="http://arxiv.org/find/cs/1/au:+Mason_J/0/1/0/all/0/1">Joel Mason</a>, <a href="http://arxiv.org/find/cs/1/au:+Blair_A/0/1/0/all/0/1">Alan Blair</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_W/0/1/0/all/0/1">Wei Wang</a>GWU NLP Lab at SemEval-2019 Task 3: EmoContext: Effective Contextual Information in Models for Emotion Detection in Sentence-level in a Multigenre Corpus. (arXiv:1905.09439v1 [cs.CL])http://arxiv.org/abs/1905.09439
<p>In this paper we present an emotion classifier model submitted to the
SemEval-2019 Task 3: EmoContext. The task objective is to classify emotion
(i.e. happy, sad, angry) in a 3-turn conversational data set. We formulate the
task as a classification problem and introduce a Gated Recurrent Neural Network
(GRU) model with attention layer, which is bootstrapped with contextual
information and trained with a multigenre corpus. We utilize different word
embeddings to empirically select the most suited one to represent our features.
We train the model with a multigenre emotion corpus to leverage using all
available training sets to bootstrap the results. We achieved overall %56.05
f1-score and placed 144.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Tafreshi_S/0/1/0/all/0/1">Shabnam Tafreshi</a>, <a href="http://arxiv.org/find/cs/1/au:+Diab_M/0/1/0/all/0/1">Mona Diab</a>One-bit LFMCW Radar: Spectrum Analysis and Target Detection. (arXiv:1905.09440v1 [eess.SP])http://arxiv.org/abs/1905.09440
<p>One-bit radar involving direct one-bit sampling is a promising technology for
many civilian applications due to its low-cost and low-power consumptions. In
this paper, problems encountered by one-bit LFMCW radar are studied and a
two-stage target detection approach termed as DR-GAMP is proposed. Firstly, the
spectrum of one-bit signal in a scenario of multiple targets is analyzed. It is
indicated that high-order harmonics may result in false alarms (FAs) and cannot
be neglected. Secondly, DR-GAMP is used to suppress the high order harmonics.
Specifically, linear preprocessing and predetection are proposed to perform
dimension reduction (DR), and then, generalized approximate message passing
(GAMP) is utilized to suppress high-order harmonics. Finally, numerical
simulations are conducted to evaluate the performance of one-bit LFMCW radar
with typical parameters. It is shown that compared to conventional radar with
linear processing approach, one-bit LFMCW radar has $0.5$ dB performance gain
when the input signal-to-noise ratios (SNRs) of targets are low. Moreover, it
has $1.6$ dB performance loss in a scenario with an additional high SNR target.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Jin_B/0/1/0/all/0/1">Benzhou Jin</a>, <a href="http://arxiv.org/find/eess/1/au:+Zhu_J/0/1/0/all/0/1">Jiang Zhu</a>, <a href="http://arxiv.org/find/eess/1/au:+Wu_Q/0/1/0/all/0/1">Qihui Wu</a>, <a href="http://arxiv.org/find/eess/1/au:+Zhang_Y/0/1/0/all/0/1">Yuhong Zhang</a>, <a href="http://arxiv.org/find/eess/1/au:+Xu_Z/0/1/0/all/0/1">Zhiwei Xu</a>Depth Estimation on Underwater Omni-directional Images Using a Deep Neural Network. (arXiv:1905.09441v1 [cs.CV])http://arxiv.org/abs/1905.09441
<p>In this work, we exploit a depth estimation Fully Convolutional Residual
Neural Network (FCRN) for in-air perspective images to estimate the depth of
underwater perspective and omni-directional images. We train one conventional
and one spherical FCRN for underwater perspective and omni-directional images,
respectively. The spherical FCRN is derived from the perspective FCRN via a
spherical longitude-latitude mapping. For that, the omni-directional camera is
modeled as a sphere, while images captured by it are displayed in the
longitude-latitude form. Due to the lack of underwater datasets, we synthesize
images in both data-driven and theoretical ways, which are used in training and
testing. Finally, experiments are conducted on these synthetic images and
results are displayed in both qualitative and quantitative way. The comparison
between ground truth and the estimated depth map indicates the effectiveness of
our method.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kuang_H/0/1/0/all/0/1">Haofei Kuang</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_Q/0/1/0/all/0/1">Qingwen Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Schwertfeger_S/0/1/0/all/0/1">S&#xf6;ren Schwertfeger</a>Causal Discovery with Cascade Nonlinear Additive Noise Models. (arXiv:1905.09442v1 [cs.LG])http://arxiv.org/abs/1905.09442
<p>Identification of causal direction between a causal-effect pair from observed
data has recently attracted much attention. Various methods based on functional
causal models have been proposed to solve this problem, by assuming the causal
process satisfies some (structural) constraints and showing that the reverse
direction violates such constraints. The nonlinear additive noise model has
been demonstrated to be effective for this purpose, but the model class is not
transitive--even if each direct causal relation follows this model, indirect
causal influences, which result from omitted intermediate causal variables and
are frequently encountered in practice, do not necessarily follow the model
constraints; as a consequence, the nonlinear additive noise model may fail to
correctly discover causal direction. In this work, we propose a cascade
nonlinear additive noise model to represent such causal influences--each direct
causal relation follows the nonlinear additive noise model but we observe only
the initial cause and final effect. We further propose a method to estimate the
model, including the unmeasured intermediate variables, from data, under the
variational auto-encoder framework. Our theoretical results show that with our
model, causal direction is identifiable under suitable technical conditions on
the data generation process. Simulation results illustrate the power of the
proposed method in identifying indirect causal relations across various
settings, and experimental results on real data suggest that the proposed model
and method greatly extend the applicability of causal discovery based on
functional causal models in nonlinear cases.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cai_R/0/1/0/all/0/1">Ruichu Cai</a>, <a href="http://arxiv.org/find/cs/1/au:+Qiao_J/0/1/0/all/0/1">Jie Qiao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_K/0/1/0/all/0/1">Kun Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Z/0/1/0/all/0/1">Zhenjie Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Hao_Z/0/1/0/all/0/1">Zhifeng Hao</a>Rate-Distortion-Memory Trade-offs in Heterogeneous Caching Networks. (arXiv:1905.09446v1 [cs.IT])http://arxiv.org/abs/1905.09446
<p>Mobile network operators are considering caching as one of the strategies to
keep up with the increasing demand for high-definition wireless video
streaming. By prefetching popular content into memory at wireless access points
or end user devices, requests can be served locally, relieving strain on
expensive backhaul. In addition, using network coding allows the simultaneous
serving of distinct cache misses via common coded multicast transmissions,
resulting in significantly larger load reductions compared to those achieved
with traditional delivery schemes. Most prior works do not exploit the
properties of video and simply treat content as fixed-size files that users
would like to fully download. Our work is motivated by the fact that video can
be coded in a scalable fashion and that the decoded video quality depends on
the number of layers a user is able to receive in sequence. Using a Gaussian
source model, caching and coded delivery methods are designed to minimize the
squared error distortion at end user devices in a rate-limited caching network.
Our framework is very general, and accounts for heterogeneous cache sizes,
video popularity distributions and user-file play-back qualities. As part of
our solution, a new decentralized scheme for lossy cache-aided delivery subject
to a given set of preset user distortion targets is proposed, which further
generalizes prior literature to a setting with file heterogeneity.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Hassanzadeh_P/0/1/0/all/0/1">Parisa Hassanzadeh</a>, <a href="http://arxiv.org/find/cs/1/au:+Tulino_A/0/1/0/all/0/1">Antonia M. Tulino</a>, <a href="http://arxiv.org/find/cs/1/au:+Llorca_J/0/1/0/all/0/1">Jaime Llorca</a>, <a href="http://arxiv.org/find/cs/1/au:+Erkip_E/0/1/0/all/0/1">Elza Erkip</a>Prototype Reminding for Continual Learning. (arXiv:1905.09447v1 [cs.CV])http://arxiv.org/abs/1905.09447
<p>Continual learning is a critical ability of continually acquiring and
transferring knowledge without catastrophically forgetting previously learned
knowledge. However, enabling continual learning for AI remains a long-standing
challenge. In this work, we propose a novel method, Prototype Reminding, that
efficiently embeds and recalls previously learnt knowledge to tackle
catastrophic forgetting issue. In particular, we consider continual learning in
classification tasks. For each classification task, our method learns a metric
space containing a set of prototypes where embedding of the samples from the
same class cluster around prototypes and class-representative prototypes are
separated apart. To alleviate catastrophic forgetting, our method preserves the
embedding function from the samples to the previous metric space, through our
proposed prototype reminding from previous tasks. Specifically, the reminding
process is implemented by replaying a small number of samples from previous
tasks and correspondingly matching their embedding to their nearest
class-representative prototypes. Compared with recent continual learning
methods, our contributions are fourfold: first, our method achieves the best
memory retention capability while adapting quickly to new tasks. Second, our
method uses metric learning for classification, and does not require adding in
new neurons given new object classes. Third, our method is more memory
efficient since only class-representative prototypes need to be recalled.
Fourth, our method suggests a promising solution for few-shot continual
learning. Without tampering with the performance on initial tasks, our method
learns novel concepts given a few training examples of each class in new tasks.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_M/0/1/0/all/0/1">Mengmi Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_T/0/1/0/all/0/1">Tao Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Lim_J/0/1/0/all/0/1">Joo Hwee Lim</a>, <a href="http://arxiv.org/find/cs/1/au:+Feng_J/0/1/0/all/0/1">Jiashi Feng</a>Parsimonious Deep Learning: A Differential Inclusion Approach with Global Convergence. (arXiv:1905.09449v1 [cs.LG])http://arxiv.org/abs/1905.09449
<p>Over-parameterization is ubiquitous nowadays in training neural networks to
benefit both optimization in seeking global optima and generalization in
reducing prediction error. However, compressive networks are desired in many
real world applications and direct training of small networks may be trapped in
local optima. In this paper, instead of pruning or distilling an
over-parameterized model to compressive ones, we propose a parsimonious
learning approach based on differential inclusions of inverse scale spaces,
that generates a family of models from simple to complex ones with a better
efficiency and interpretability than stochastic gradient descent in exploring
the model space. It enjoys a simple discretization, the Split Linearized
Bregman Iterations, with provable global convergence that from any
initializations, algorithmic iterations converge to a critical point of
empirical risks. One may exploit the proposed method to boost the complexity of
neural networks progressively. Numerical experiments with MNIST, Cifar-10/100,
and ImageNet are conducted to show the method is promising in training large
scale models with a favorite interpretability.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Fu_Y/0/1/0/all/0/1">Yanwei Fu</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_C/0/1/0/all/0/1">Chen Liu</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_D/0/1/0/all/0/1">Donghao Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Sun_X/0/1/0/all/0/1">Xinwei Sun</a>, <a href="http://arxiv.org/find/cs/1/au:+Zeng_J/0/1/0/all/0/1">Jinshan Zeng</a>, <a href="http://arxiv.org/find/cs/1/au:+Yao_Y/0/1/0/all/0/1">Yuan Yao</a>Lewisian Fixed Points I: Two Incomparable Constructions. (arXiv:1905.09450v1 [cs.LO])http://arxiv.org/abs/1905.09450
<p>Our paper is the first study of what one might call "reverse mathematics of
explicit fixpoints". We study two methods of constructing such fixpoints for
formulas whose principal connective is the intuitionistic Lewis arrow. Our main
motivation comes from metatheory of constructive arithmetic, but the systems in
question allows several natural semantics. The first of these methods, inspired
by de Jongh and Visser, turns out to yield a well-understood modal system. The
second one by de Jongh and Sambin, seemingly simpler, leads to a modal theory
that proves harder to axiomatize in an elegant way. Apart from showing that
both theories are incomparable, we axiomatize their join and investigate
several subtheories, whose axioms are obtained as fixpoints of simple formulas.
We also show that they are extension stable, that is, their validity in the
corresponding preservativity logic of a given arithmetical theory transfer to
its finite extensions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Litak_T/0/1/0/all/0/1">Tadeusz Litak</a>, <a href="http://arxiv.org/find/cs/1/au:+Visser_A/0/1/0/all/0/1">Albert Visser</a>Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network. (arXiv:1905.09453v1 [cs.LG])http://arxiv.org/abs/1905.09453
<p>Two main obstacles preventing the widespread adoption of variational Bayesian
neural networks are the high parameter overhead that makes them infeasible on
large networks, and the difficulty of implementation, which can be thought of
as "programming overhead." MC dropout [Gal and Ghahramani, 2016] is popular
because it sidesteps these obstacles. Nevertheless, dropout is often harmful to
model performance when used in networks with batch normalization layers [Li et
al., 2018], which are an indispensable part of modern neural networks. We
construct a general variational family for ensemble-based Bayesian neural
networks that encompasses dropout as a special case. We further present two
specific members of this family that work well with batch normalization layers,
while retaining the benefits of low parameter and programming overhead,
comparable to non-Bayesian training. Our proposed methods improve predictive
accuracy and achieve almost perfect calibration on a ResNet-18 trained with
ImageNet.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chang_O/0/1/0/all/0/1">Oscar Chang</a>, <a href="http://arxiv.org/find/cs/1/au:+Yao_Y/0/1/0/all/0/1">Yuling Yao</a>, <a href="http://arxiv.org/find/cs/1/au:+Williams_King_D/0/1/0/all/0/1">David Williams-King</a>, <a href="http://arxiv.org/find/cs/1/au:+Lipson_H/0/1/0/all/0/1">Hod Lipson</a>Time-Domain Mixed-Signal Vector-by-Matrix Multiplier Exploiting 1T-1R Array. (arXiv:1905.09454v1 [eess.SP])http://arxiv.org/abs/1905.09454
<p>The emerging mobile devices in this era of internet-of-things (IoT) require a
dedicated processor to enable computationally intensive applications such as
neuromorphic computing and signal processing. Vector-by-matrix multiplication
(VMM) is the most prominent operation in these applications. Therefore, compact
and power-efficient VMM blocks are required to perform resource-intensive
computations. To this end, in this work, for the first time, we propose a
time-domain mixed-signal VMM exploiting a modified configuration of 1 MOSFET-1
RRAM (1T-1R) array which overcomes the energy inefficiency of the current-mode
VMM approaches based on RRAMs. In the proposed approach, the inputs and outputs
are encoded in the digital domain as the duration of the pulses while the
weights are realized as programmable current sinks utilizing the modified 1T-1R
blocks in the analog domain. We perform a rigorous analysis of the different
factors such as channel length modulation (CLM), drain-induced barrier lowering
(DIBL), capacitive coupling, etc which may degrade the compute precision of the
proposed VMM approach. We show that there exists a trade-off between the
compute precision, dynamic range and the energy efficiency in the modified
1T-1R array based VMM approach. Therefore, we also provide the necessary design
guidelines for optimizing the performance of this implementation. The
preliminary results show that an effective compute precision greater than
8-bits is achievable owing to the inherent compensation effect with an energy
efficiency of ~42 TOps/J considering the input/output (I/O) circuitry for a
200x200 VMM utilizing the proposed approach.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Sahay_S/0/1/0/all/0/1">Shubham Sahay</a>, <a href="http://arxiv.org/find/eess/1/au:+Bavandpour_M/0/1/0/all/0/1">Mohammad Bavandpour</a>, <a href="http://arxiv.org/find/eess/1/au:+Mahmoodi_M/0/1/0/all/0/1">Mohammad Reza Mahmoodi</a>, <a href="http://arxiv.org/find/eess/1/au:+Strukov_D/0/1/0/all/0/1">Dmitri Strukov</a>Approximate String Matching for DNS Anomaly Detection. (arXiv:1905.09455v1 [cs.CR])http://arxiv.org/abs/1905.09455
<p>In this paper we propose a novel approach to identify anomalies in DNS
traffic. The traffic time-points data is transformed to a string, which is used
by new fast appproximate string matching algorithm to detect anomalies. Our
approach is generic in its nature and allows fast adaptation to different types
of traffic. We evaluate the approach on a large public dataset of DNS traffic
based on 10 days, discovering more than order of magnitude DNS attacks in
comparison to auto-regression as a baseline. Moreover, the additional
comparison has been made including other common regressors such as Linear
Regression, Lasso, Random Forest and KNN, all of them showing the superiority
of our approach.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Mateless_R/0/1/0/all/0/1">Roni Mateless</a>, <a href="http://arxiv.org/find/cs/1/au:+Segal_M/0/1/0/all/0/1">Michael Segal</a>Formalizing Time4sys using parametric timed automata. (arXiv:1905.09458v1 [cs.SE])http://arxiv.org/abs/1905.09458
<p>Critical real-time systems must be verified to avoid the risk of dramatic
consequences in case of failure. Thales developed an open formalism Time4sys to
model real-time systems, with expressive features such as periodic or sporadic
tasks, task dependencies, distributed systems, etc. However, Time4sys does not
natively allow for a formal reasoning. In this work, we present a translation
from Time4sys to (parametric) timed automata, so as to allow for a formal
verification.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Andre_E/0/1/0/all/0/1">&#xc9;tienne Andr&#xe9;</a>On the Critical Difference of Almost Bipartite Graphs. (arXiv:1905.09462v1 [cs.DM])http://arxiv.org/abs/1905.09462
<p>A set $S\subseteq V$ is \textit{independent} in a graph $G=\left( V,E\right)
$ if no two vertices from $S$ are adjacent. The \textit{independence number}
$\alpha(G)$ is the cardinality of a maximum independent set, while $\mu(G)$ is
the size of a maximum matching in $G$. If $\alpha(G)+\mu(G)$ equals the order
of $G$, then $G$ is called a \textit{K\"{o}nig-Egerv\'{a}ry graph
}\cite{dem,ster}. The number $d\left( G\right) =\max\{\left\vert A\right\vert
-\left\vert N\left( A\right) \right\vert :A\subseteq V\}$ is called the
\textit{critical difference} of $G$ \cite{Zhang} (where $N\left( A\right)
=\left\{ v:v\in V,N\left( v\right) \cap A\neq\emptyset\right\} $). It is known
that $\alpha(G)-\mu(G)\leq d\left( G\right) $ holds for every graph
\cite{Levman2011a,Lorentzen1966,Schrijver2003}. In \cite{LevMan5} it was shown
that $d(G)=\alpha(G)-\mu(G)$ is true for every K\"{o}nig-Egerv\'{a}ry graph.
</p>
<p>A graph $G$ is \textit{(i)} \textit{unicyclic} if it has a unique cycle,
\textit{(ii)} \textit{almost bipartite} if it has only one odd cycle. It was
conjectured in \cite{LevMan2012a,LevMan2013a} and validated in
\cite{Bhattacharya2018} that $d(G)=\alpha(G)-\mu(G)$ holds for every unicyclic
non-K\"{o}nig-Egerv\'{a}ry graph $G$.
</p>
<p>In this paper we prove that if $G$ is an almost bipartite graph of order
$n\left( G\right) $, then $\alpha(G)+\mu(G)\in\left\{ n\left( G\right)
-1,n\left( G\right) \right\} $. Moreover, for each of these two values, we
characterize the corresponding graphs. Further, using these findings, we show
that the critical difference of an almost bipartite graph $G$ satisfies \[
d(G)=\alpha(G)-\mu(G)=\left\vert \mathrm{core}(G)\right\vert -\left\vert
N(\mathrm{core}(G))\right\vert , \] where by \textrm{core}$\left( G\right) $ we
mean the intersection of all maximum independent sets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Levit_V/0/1/0/all/0/1">Vadim E. Levit</a>, <a href="http://arxiv.org/find/cs/1/au:+Mandrescu_E/0/1/0/all/0/1">Eugen Mandrescu</a>Improved EEG Classification by factoring in sensor topography. (arXiv:1905.09472v1 [eess.SP])http://arxiv.org/abs/1905.09472
<p>Electroencephalography (EEG) serves as an effective diagnostic tool for
mental disorders and neurological abnormalities. Enhanced analysis and
classification of EEG signals can help improve detection performance. This work
presents a new approach that seeks to exploit the knowledge of EEG sensor
spatial configuration to achieve higher detection accuracy. Two classification
models, one which ignores the configuration (model 1) and one that exploits it
with different interpolation methods (model 2), are studied. The analysis is
based on the information content of these signals represented in two different
ways: concatenation of the channels of the frequency bands and an image-like 2D
representation of the EEG channel locations. Performance of these models is
examined on two tasks, social anxiety disorder (SAD) detection, and emotion
recognition using DEAP dataset. Validity of our hypothesis that model 2 will
significantly outperform model 1 is borne out in the results, with accuracy
$5$--$8\%$ higher for model 2 for each machine learning algorithm we
investigated. Convolutional Neural Networks (CNN) were found to provide much
better performance than SVM and kNNs.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Mokatren_L/0/1/0/all/0/1">Lubna Shibly Mokatren</a>, <a href="http://arxiv.org/find/eess/1/au:+Ansari_R/0/1/0/all/0/1">Rashid Ansari</a>, <a href="http://arxiv.org/find/eess/1/au:+Cetin_A/0/1/0/all/0/1">Ahmet Enis Cetin</a>, <a href="http://arxiv.org/find/eess/1/au:+Leow_A/0/1/0/all/0/1">Alex D Leow</a>, <a href="http://arxiv.org/find/eess/1/au:+Klumpp_H/0/1/0/all/0/1">Heide Klumpp</a>, <a href="http://arxiv.org/find/eess/1/au:+Ajilore_O/0/1/0/all/0/1">Olusola Ajilore</a>, <a href="http://arxiv.org/find/eess/1/au:+Vural_F/0/1/0/all/0/1">Fatos Yarman Vural</a>Private Queries on Public Certificate Transparency Data. (arXiv:1905.09478v1 [cs.CR])http://arxiv.org/abs/1905.09478
<p>Despite increasing advancements in today's information exchange
infrastructure, the preservation of user data and privacy still remains a
problem. Both insecure baselines and secure solutions leak user data. For
example, Certificate Transparency (CT) promises significant security
improvements to existing Public Key Infrastructure solutions that up-to-now
have solely relied on the Certificate Authority hierarchy. CT provides a robust
auditing layer and transparency solution to quickly detect such compromises,
but introduces the requirement that client browsers interact with third-party
servers when validating a site certificate. In the existing CT system, these
requests leak information about each user's browsing habits to the hosting
server. It is not a stretch to think that this valuable data could be collected
and exploited, as corporations and governments have plenty of financial and
political incentive to do so. In this project, we seek to address this problem
by using an oblivious file sharing system with strong anonymity properties, to
provide a more scalable, performant solution to privacy-preserving queries.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Phan_V/0/1/0/all/0/1">Vy-An Phan</a>Constrained Design of Deep Iris Networks. (arXiv:1905.09481v1 [cs.CV])http://arxiv.org/abs/1905.09481
<p>Despite the promise of recent deep neural networks in the iris recognition
setting, there are vital properties of the classic IrisCode which are almost
unable to be achieved with current deep iris networks: the compactness of model
and the small number of computing operations (FLOPs). This paper re-models the
iris network design process as a constrained optimization problem which takes
model size and computation into account as learning criteria. On one hand, this
allows us to fully automate the network design process to search for the best
iris network confined to the computation and model compactness constraints. On
the other hand, it allows us to investigate the optimality of the classic
IrisCode and recent iris networks. It also allows us to learn an optimal iris
network and demonstrate state-of-the-art performance with less computation and
memory requirements.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Nguyen_K/0/1/0/all/0/1">Kien Nguyen</a>, <a href="http://arxiv.org/find/cs/1/au:+Fookes_C/0/1/0/all/0/1">Clinton Fookes</a>, <a href="http://arxiv.org/find/cs/1/au:+Sridharan_S/0/1/0/all/0/1">Sridha Sridharan</a>Towards Generation and Evaluation of Comprehensive Mapping Robot Datasets. (arXiv:1905.09483v1 [cs.RO])http://arxiv.org/abs/1905.09483
<p>This paper presents a fully hardware synchronized mapping robot with support
for a hardware synchronized external tracking system, for super-precise timing
and localization. We also employ a professional, static 3D scanner for ground
truth map collection. Three datasets are generated to evaluate the performance
of mapping algorithms within a room and between rooms. Based on these datasets
we generate maps and trajectory data, which is then fed into evaluation
algorithms. The mapping and evaluation procedures are made in a very easily
reproducible manner for maximum comparability. In the end we can draw a couple
of conclusions about the tested SLAM algorithms.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chen_H/0/1/0/all/0/1">Hongyu Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_X/0/1/0/all/0/1">Xiting Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Luo_J/0/1/0/all/0/1">Jianwen Luo</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_Z/0/1/0/all/0/1">Zhijie Yang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_Z/0/1/0/all/0/1">Zehao Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Wan_H/0/1/0/all/0/1">Haochuan Wan</a>, <a href="http://arxiv.org/find/cs/1/au:+Ye_X/0/1/0/all/0/1">Xiaoya Ye</a>, <a href="http://arxiv.org/find/cs/1/au:+Weng_G/0/1/0/all/0/1">Guangyuan Weng</a>, <a href="http://arxiv.org/find/cs/1/au:+He_Z/0/1/0/all/0/1">Zhenpeng He</a>, <a href="http://arxiv.org/find/cs/1/au:+Dong_T/0/1/0/all/0/1">Tian Dong</a>, <a href="http://arxiv.org/find/cs/1/au:+Schwertfeger_S/0/1/0/all/0/1">S&#xf6;ren Schwertfeger</a>Simple Bounds for the Symmetric Capacity of the Rayleigh Fading Multiple Access Channel. (arXiv:1905.09486v1 [cs.IT])http://arxiv.org/abs/1905.09486
<p>Communication over the i.i.d. Rayleigh slow-fading MAC is considered, where
all terminals are equipped with a single antenna. Further, a communication
protocol is considered where all users transmit at (just below) the symmetric
capacity (per user) of the channel, a rate which is fed back (dictated) to the
users by the base station. Tight bounds are established on the distribution of
the rate attained by the protocol. In particular, these bounds characterize the
probability that the dominant face of the MAC capacity region contains a
symmetric rate point, i.e., that the considered protocol strictly attains the
sum capacity of the channel. The analysis provides a non-asymptotic counterpart
to the diversity-multiplexing tradeoff of the multiple access channel. Finally,
a practical scheme based on integer-forcing and space-time precoding is shown
to be an effective coding architecture for this communication scenario.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Domanovitz_E/0/1/0/all/0/1">Elad Domanovitz</a>, <a href="http://arxiv.org/find/cs/1/au:+Erez_U/0/1/0/all/0/1">Uri Erez</a>Combine PPO with NES to Improve Exploration. (arXiv:1905.09492v1 [cs.LG])http://arxiv.org/abs/1905.09492
<p>We introduce two approaches for combining neural evolution strategy (NES) and
proximal policy optimization (PPO): parameter transfer and parameter space
noise. Parameter transfer is a PPO agent with parameters transferred from a NES
agent. Parameter space noise is to directly add noise to the PPO agent`s
parameters. We demonstrate that PPO could benefit from both methods through
experimental comparison on discrete action environments as well as continuous
control tasks
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_L/0/1/0/all/0/1">Lianjiang Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_Y/0/1/0/all/0/1">Yunrong Yang</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_B/0/1/0/all/0/1">Bingna Li</a>Teleoperator Imitation with Continuous-time Safety. (arXiv:1905.09499v1 [cs.RO])http://arxiv.org/abs/1905.09499
<p>Learning to effectively imitate human teleoperators, with generalization to
unseen and dynamic environments, is a promising path to greater autonomy
enabling robots to steadily acquire complex skills from supervision. We propose
a new motion learning technique rooted in contraction theory and sum-of-squares
programming for estimating a control law in the form of a polynomial vector
field from a given set of demonstrations. Notably, this vector field is
provably optimal for the problem of minimizing imitation loss while providing
continuous-time guarantees on the induced imitation behavior. Our method
generalizes to new initial and goal poses of the robot and can adapt in
real-time to dynamic obstacles during execution, with convergence to
teleoperator behavior within a well-defined safety tube. We present an
application of our framework for pick-and-place tasks in the presence of moving
obstacles on a 7-DOF KUKA IIWA arm. The method compares favorably to other
learning-from-demonstration approaches on benchmark handwriting imitation
tasks.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Khadir_B/0/1/0/all/0/1">Bachir El Khadir</a>, <a href="http://arxiv.org/find/cs/1/au:+Varley_J/0/1/0/all/0/1">Jake Varley</a>, <a href="http://arxiv.org/find/cs/1/au:+Sindhwani_V/0/1/0/all/0/1">Vikas Sindhwani</a>Pose estimator and tracker using temporal flow maps for limbs. (arXiv:1905.09500v1 [cs.CV])http://arxiv.org/abs/1905.09500
<p>For human pose estimation in videos, it is significant how to use temporal
information between frames. In this paper, we propose temporal flow maps for
limbs (TML) and a multi-stride method to estimate and track human poses. The
proposed temporal flow maps are unit vectors describing the limbs' movements.
We constructed a network to learn both spatial information and temporal
information end-to-end. Spatial information such as joint heatmaps and part
affinity fields is regressed in the spatial network part, and the TML is
regressed in the temporal network part. We also propose a data augmentation
method to learn various types of TML better. The proposed multi-stride method
expands the data by randomly selecting two frames within a defined range. We
demonstrate that the proposed method efficiently estimates and tracks human
poses on the PoseTrack 2017 and 2018 datasets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Hwang_J/0/1/0/all/0/1">Jihye Hwang</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_J/0/1/0/all/0/1">Jieun Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+Park_S/0/1/0/all/0/1">Sungheon Park</a>, <a href="http://arxiv.org/find/cs/1/au:+Kwak_N/0/1/0/all/0/1">Nojun Kwak</a>Flexible Computational Pipelines for Robust Abstraction-Based Control Synthesis. (arXiv:1905.09503v1 [cs.SY])http://arxiv.org/abs/1905.09503
<p>Successfully synthesizing controllers for complex dynamical systems and
specifications often requires leveraging domain knowledge as well as making
difficult computational or mathematical tradeoffs. This paper presents a
flexible and extensible framework for constructing robust control synthesis
algorithms and applies this to the traditional abstraction-based control
synthesis pipeline. It is grounded in the theory of relational interfaces and
provides a principled methodology to seamlessly combine different techniques
(such as dynamic precision grids, refining abstractions while synthesizing, or
decomposed control predecessors) or create custom procedures to exploit an
application's intrinsic structural properties. A Dubins vehicle is used as a
motivating example to showcase memory and runtime improvements.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kim_E/0/1/0/all/0/1">Eric S. Kim</a>, <a href="http://arxiv.org/find/cs/1/au:+Arcak_M/0/1/0/all/0/1">Murat Arcak</a>, <a href="http://arxiv.org/find/cs/1/au:+Seshia_S/0/1/0/all/0/1">Sanjit A. Seshia</a>Graph Searches and Their End Vertices. (arXiv:1905.09505v1 [cs.DS])http://arxiv.org/abs/1905.09505
<p>Graph search, the process of visiting vertices in a graph in a specific
order, has demonstrated magical powers in many important algorithms. But a
systematic study was only initiated by Corneil et al.~a decade ago, and only by
then we started to realize how little we understand it. Even the apparently
na\"{i}ve question "which vertex can be the last visited by a graph search
algorithm," known as the end vertex problem, turns out to be quite elusive. We
give a full picture of all maximum cardinality searches on chordal graphs,
which implies a polynomial-time algorithm for the end vertex problem of maximum
cardinality search. It is complemented by a proof of NP-completeness of the
same problem on weakly chordal graphs.
</p>
<p>We also show linear-time algorithms for deciding end vertices of
breadth-first searches on interval graphs, and end vertices of lexicographic
depth-first searches on chordal graphs. Finally, we present $2^n\cdot
n^{O(1)}$-time algorithms for deciding the end vertices of breadth-first
searches, depth-first searches, maximum cardinality searches, and maximum
neighborhood searches on general graphs.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cao_Y/0/1/0/all/0/1">Yixin Cao</a>, <a href="http://arxiv.org/find/cs/1/au:+Rong_G/0/1/0/all/0/1">Guozhen Rong</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_J/0/1/0/all/0/1">Jianxin Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_Z/0/1/0/all/0/1">Zhifeng Wang</a>Stabilization under round robin scheduling of control inputs in nonlinear systems. (arXiv:1905.09507v1 [math.OC])http://arxiv.org/abs/1905.09507
<p>We study the qualitative behavior of multivariable control-affine nonlinear
systems under sparsification of feedback controllers. Sparsification in our
context refers to the scheduling of the individual control inputs one at a time
in rapid periodic sweeps over the set of control inputs, which we call the
round robin scheduling. We prove that if a locally asymptotically stabilizing
feedback controller is sparsified via the round robin scheme and each control
action is scaled appropriately, then the corresponding equilibrium of the
resulting system is stabilized when the scheduling is sufficiently fast; under
mild additional conditions, local asymptotic stabilization of the corresponding
equilibrium can also be guaranteed. Our technical tools are derived from
optimal control theory, and our results also contribute to the literature on
the stability of switched systems in the fast switching regime. Illustrative
numerical examples depicting several subtle features of our results are
included.
</p>
<a href="http://arxiv.org/find/math/1/au:+Maheshwari_C/0/1/0/all/0/1">Chinmay Maheshwari</a>, <a href="http://arxiv.org/find/math/1/au:+Srikant_S/0/1/0/all/0/1">Sukumar Srikant</a>, <a href="http://arxiv.org/find/math/1/au:+Chatterjee_D/0/1/0/all/0/1">Debasish Chatterjee</a>Leveraging Uncertainty in Deep Learning for Selective Classification. (arXiv:1905.09509v1 [cs.LG])http://arxiv.org/abs/1905.09509
<p>The wide and rapid adoption of deep learning by practitioners brought
unintended consequences in many situations such as in the infamous case of
Google Photos' racist image recognition algorithm; thus, necessitated the
utilization of the quantified uncertainty for each prediction. There have been
recent efforts towards quantifying uncertainty in conventional deep learning
methods (e.g., dropout as Bayesian approximation); however, their optimal use
in decision making is often overlooked and understudied. In this study, we
propose a mixed-integer programming framework for classification with reject
option (also known as selective classification), that investigates and combines
model uncertainty and predictive mean to identify optimal classification and
rejection regions. Our results indicate superior performance of our framework
both in non-rejected accuracy and rejection quality on several publicly
available datasets. Moreover, we extend our framework to cost-sensitive
settings and show that our approach outperforms industry standard methods
significantly for online fraud management in real-world settings.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Yildirim_M/0/1/0/all/0/1">Mehmet Yigit Yildirim</a>, <a href="http://arxiv.org/find/cs/1/au:+Ozer_M/0/1/0/all/0/1">Mert Ozer</a>, <a href="http://arxiv.org/find/cs/1/au:+Davulcu_H/0/1/0/all/0/1">Hasan Davulcu</a>Scale-free networks revealed from finite-size scaling. (arXiv:1905.09512v1 [physics.soc-ph])http://arxiv.org/abs/1905.09512
<p>Networks play a vital role in the development of predictive models of
physical, biological, and social collective phenomena. A quite remarkable
feature of many of these networks is that they are believed to be approximately
scale free: the fraction of nodes with $k$ incident links (the degree) follows
a power law $p(k)\propto k^{-\lambda}$ for sufficiently large degree $k$. The
value of the exponent $\lambda$ as well as deviations from power law scaling
provide invaluable information on the mechanisms underlying the formation of
the network such as small degree saturation, variations in the local fitness to
compete for links, and high degree cutoffs owing to the finite size of the
network. Indeed real networks are not infinitely large and the largest degree
of any network cannot be larger than the number of nodes. Finite size scaling
is a useful tool for analyzing deviations from power law behavior in the
vicinity of a critical point in a physical system arising due to a finite
correlation length. Here we show that despite the essential differences between
networks and critical phenomena, finite size scaling provides a powerful
framework for analyzing self-similarity and the scale free nature of empirical
networks. We analyze about two hundred naturally occurring networks with
distinct dynamical origins, and find that a large number of these follow the
finite size scaling hypothesis without any self-tuning. Notably this is the
case of biological protein interaction networks, technological computer and
hyperlink networks and informational citation and lexical networks. Marked
deviations appear in other examples, especially infrastructure and
transportation networks, but also social, affiliation and annotation networks.
Strikingly, the values of the scaling exponents are not independent but satisfy
an approximate exponential relationship.
</p>
<a href="http://arxiv.org/find/physics/1/au:+Serafino_M/0/1/0/all/0/1">Matteo Serafino</a>, <a href="http://arxiv.org/find/physics/1/au:+Cimini_G/0/1/0/all/0/1">Giulio Cimini</a>, <a href="http://arxiv.org/find/physics/1/au:+Maritan_A/0/1/0/all/0/1">Amos Maritan</a>, <a href="http://arxiv.org/find/physics/1/au:+Suweis_S/0/1/0/all/0/1">Samir Suweis</a>, <a href="http://arxiv.org/find/physics/1/au:+Banavar_J/0/1/0/all/0/1">Jayanth R. Banavar</a>, <a href="http://arxiv.org/find/physics/1/au:+Caldarelli_G/0/1/0/all/0/1">Guido Caldarelli</a>Downlink Non-Orthogonal Multiple Access without SIC for Block Fading Channels: An Algebraic Rotation Approach. (arXiv:1905.09514v1 [cs.IT])http://arxiv.org/abs/1905.09514
<p>In this paper, we investigate the problem of downlink non-orthogonal multiple
access (NOMA) over block fading channels. For the single antenna case, we
propose a class of NOMA schemes where all the users' signals are mapped into
$n$-dimensional constellations corresponding to the same algebraic lattices
from a number field, allowing every user attains full diversity gain with
single-user decoding, i.e., no successive interference cancellation (SIC). The
minimum product distances of the proposed scheme with arbitrary power
allocation factor are analyzed and their upper bounds are derived. Within the
proposed class of schemes, we also identify a special family of NOMA schemes
based on lattice partitions of the underlying ideal lattices, whose minimum
product distances can be easily controlled. Our analysis shows that among the
proposed schemes, the lattice-partition-based schemes achieve the largest
minimum product distances of the superimposed constellations, which are closely
related to the symbol error rates for receivers with single-user decoding.
Simulation results are presented to verify our analysis and to show the
effectiveness of the proposed schemes as compared to benchmark NOMA schemes.
Extensions of our design to the multi-antenna case are also considered where
similar analysis and results are presented.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Qiu_M/0/1/0/all/0/1">Min Qiu</a>, <a href="http://arxiv.org/find/cs/1/au:+Huang_Y/0/1/0/all/0/1">Yu-Chih Huang</a>, <a href="http://arxiv.org/find/cs/1/au:+Yuan_J/0/1/0/all/0/1">Jinhong Yuan</a>IN2LAAMA: INertial Lidar Localisation Autocalibration And MApping. (arXiv:1905.09517v1 [cs.RO])http://arxiv.org/abs/1905.09517
<p>In this paper, we present INertial Lidar Localisation Autocalibration And
MApping (IN2LAAMA): a probabilistic framework for localisation, mapping, and
extrinsic calibration based on a 3D-lidar and a 6-DoF-IMU. Most of today's
lidars collect geometric information about the surrounding environment by
sweeping lasers across their field of view. Consequently, 3D-points in one
lidar scan are acquired at different timestamps. If the sensor trajectory is
not accurately known, the scans are affected by the phenomenon known as motion
distortion. The proposed method leverages preintegration with a continuous
representation of the inertial measurements to characterise the system's motion
at any point in time. It enables precise correction of the motion distortion
without relying on any explicit motion model. The system's pose, velocity,
biases, and time-shift are estimated via a full batch optimisation that
includes automatically generated loop-closure constraints. The autcalibration
and the registration of lidar data relies on planar and edge features matched
across pairs of scans. The performance of the framework is validated through
simulated and real-data experiments.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Gentil_C/0/1/0/all/0/1">Cedric Le Gentil</a>, <a href="http://arxiv.org/find/cs/1/au:+Vidal_Calleja_T/0/1/0/all/0/1">Teresa Vidal-Calleja</a>, <a href="http://arxiv.org/find/cs/1/au:+Huang_S/0/1/0/all/0/1">Shoudong Huang</a>The African Wildlife Ontology tutorial ontologies: requirements, design, and content. (arXiv:1905.09519v1 [cs.AI])http://arxiv.org/abs/1905.09519
<p>Background. Most tutorial ontologies focus on illustrating one aspect of
ontology development, notably language features and automated reasoners, but
ignore ontology development factors, such as emergent modelling guidelines and
ontological principles. Yet, novices replicate examples from the exercises they
carry out. Not providing good examples holistically causes the propagation of
sub-optimal ontology development, which may negatively affect the quality of a
real domain ontology. Results. We identified 22 requirements that a good
tutorial ontology should satisfy regarding subject domain, logics and
reasoning, and engineering aspects. We developed a set of ontologies about
African Wildlife to serve as tutorial ontologies. A majority of the
requirements have been met with the set of African Wildlife Ontology tutorial
ontologies, which are introduced in this paper. The African Wildlife Ontology
is mature and has been used yearly in an ontology engineering course or
tutorial since 2010 and is included in a recent ontology engineering textbook
with relevant examples and exercises. Conclusion. The African Wildlife Ontology
provides a wide range of options concerning examples and exercises for ontology
engineering well beyond illustrating only language features and automated
reasoning. It assists in demonstrating tasks about ontology quality, such as
alignment to a foundational ontology and satisfying competency questions,
versioning, and multilingual ontologies.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Keet_C/0/1/0/all/0/1">C Maria Keet</a>Towards Physical Hybrid Systems. (arXiv:1905.09520v1 [cs.LO])http://arxiv.org/abs/1905.09520
<p>Some hybrid systems models are unsafe for mathematically correct but
physically unrealistic reasons. For example, mathematical models can classify a
system as being unsafe on a set that is too small to have physical importance.
In particular, differences in measure zero sets in models of cyber-physical
systems (CPS) have significant mathematical impact on the mathematical safety
of these models even though differences on measure zero sets have no tangible
physical effect in a real system. We develop the concept of "physical hybrid
systems" (PHS) to help reunite mathematical models with physical reality. We
modify a hybrid systems logic (differential temporal dynamic logic) by adding a
first-class operator to elide distinctions on measure zero sets of time within
CPS models. This approach facilitates modeling since it admits the verification
of a wider class of models, including some physically realistic models that
would otherwise be classified as mathematically unsafe. We also develop a proof
calculus to help with the verification of PHS.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cordwell_K/0/1/0/all/0/1">Katherine Cordwell</a>, <a href="http://arxiv.org/find/cs/1/au:+Platzer_A/0/1/0/all/0/1">Andr&#xe9; Platzer</a>Combination of linear classifiers using score function -- analysis of possible combination strategies. (arXiv:1905.09522v1 [cs.LG])http://arxiv.org/abs/1905.09522
<p>In this work, we addressed the issue of combining linear classifiers using
their score functions. The value of the scoring function depends on the
distance from the decision boundary. Two score functions have been tested and
four different combination strategies were investigated. During the
experimental study, the proposed approach was applied to the heterogeneous
ensemble and it was compared to two reference methods -- majority voting and
model averaging respectively. The comparison was made in terms of seven
different quality criteria. The result shows that combination strategies based
on simple average, and trimmed average are the best combination strategies of
the geometrical combination.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Trajdos_P/0/1/0/all/0/1">Pawel Trajdos</a>, <a href="http://arxiv.org/find/cs/1/au:+Burduk_R/0/1/0/all/0/1">Robert Burduk</a>Hierarchical Annotation of Images with Two-Alternative-Forced-Choice Metric Learning. (arXiv:1905.09523v1 [cs.LG])http://arxiv.org/abs/1905.09523
<p>Many tasks such as retrieval and recommendations can significantly benefit
from structuring the data, commonly in a hierarchical way. To achieve this
through annotations of high dimensional data such as images or natural text can
be significantly labor intensive. We propose an approach for uncovering the
hierarchical structure of data based on efficient discriminative testing rather
than annotations of individual datapoints. Using two-alternative-forced-choice
(2AFC) testing and deep metric learning we achieve embedding of the data in
semantic space where we are able to successfully hierarchically cluster. We
actively select triplets for the 2AFC test such that the modeling process is
highly efficient with respect to the number of tests presented to the
annotator. We empirically demonstrate the feasibility of the method by
confirming the shape bias on synthetic data and extract hierarchical structure
on the Fashion-MNIST dataset to a finer granularity than the original labels.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Hellinga_N/0/1/0/all/0/1">Niels Hellinga</a>, <a href="http://arxiv.org/find/cs/1/au:+Menkovski_V/0/1/0/all/0/1">Vlado Menkovski</a>MCScript2.0: A Machine Comprehension Corpus Focused on Script Events and Participants. (arXiv:1905.09531v1 [cs.CL])http://arxiv.org/abs/1905.09531
<p>We introduce MCScript2.0, a machine comprehension corpus for the end-to-end
evaluation of script knowledge. MCScript2.0 contains approx. 20,000 questions
on approx. 3,500 texts, crowdsourced based on a new collection process that
results in challenging questions. Half of the questions cannot be answered from
the reading texts, but require the use of commonsense and, in particular,
script knowledge. We give a thorough analysis of our corpus and show that while
the task is not challenging to humans, existing machine comprehension models
fail to perform well on the data, even if they make use of a commonsense
knowledge base. The dataset is available at
<a href="http://www.sfb1102.uni-saarland.de/?page_id=2582">this http URL</a>
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ostermann_S/0/1/0/all/0/1">Simon Ostermann</a>, <a href="http://arxiv.org/find/cs/1/au:+Roth_M/0/1/0/all/0/1">Michael Roth</a>, <a href="http://arxiv.org/find/cs/1/au:+Pinkal_M/0/1/0/all/0/1">Manfred Pinkal</a>SynFuzz: Efficient Concolic Execution via Branch Condition Synthesis. (arXiv:1905.09532v1 [cs.CR])http://arxiv.org/abs/1905.09532
<p>Concolic execution is a powerful program analysis technique for exploring
execution paths in a systematic manner. Compare to random-mutation-based
fuzzing, concolic execution is especially good at exploring paths that are
guarded by complex and tight branch predicates (e.g., (a*b) == 0xdeadbeef). The
drawback, however, is that concolic execution engines are much slower than
native execution. One major source of the slowness is that concolic execution
engines have to the interpret instructions to maintain the symbolic expression
of program variables. In this work, we propose SynFuzz, a novel approach to
perform scalable concolic execution. SynFuzz achieves this goal by replacing
interpretation with dynamic taint analysis and program synthesis. In
particular, to flip a conditional branch, SynFuzz first uses operation-aware
taint analysis to record a partial expression (i.e., a sketch) of its branch
predicate. Then it uses oracle-guided program synthesis to reconstruct the
symbolic expression based on input-output pairs. The last step is the same as
traditional concolic execution - SynFuzz consults a SMT solver to generate an
input that can flip the target branch. By doing so, SynFuzz can achieve an
execution speed that is close to fuzzing while retain concolic execution's
capability of flipping complex branch predicates. We have implemented a
prototype of SynFuzz and evaluated it with three sets of programs: real-world
applications, the LAVA-M benchmark, and the Google Fuzzer Test Suite (FTS). The
evaluation results showed that SynFuzz was much more scalable than traditional
concolic execution engines, was able to find more bugs in LAVA-M than most
state-of-the-art concolic execution engine (QSYM), and achieved better code
coverage on real-world applications and FTS.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Han_W/0/1/0/all/0/1">Wookhyun Han</a>, <a href="http://arxiv.org/find/cs/1/au:+Rahman_M/0/1/0/all/0/1">Md Lutfor Rahman</a>, <a href="http://arxiv.org/find/cs/1/au:+Chen_Y/0/1/0/all/0/1">Yuxuan Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Song_C/0/1/0/all/0/1">Chengyu Song</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_B/0/1/0/all/0/1">Byoungyoung Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+Shin_I/0/1/0/all/0/1">Insik Shin</a>Incorporating Human Domain Knowledge in 3D LiDAR-based Semantic Segmentation. (arXiv:1905.09533v1 [cs.RO])http://arxiv.org/abs/1905.09533
<p>This work studies semantic segmentation using 3D LiDAR data. Popular deep
learning methods applied for this task require a large number of manual
annotations to train the parameters. We propose a new method that makes full
use of the advantages of traditional methods and deep learning methods via
incorporating human domain knowledge into the neural network model to reduce
the demand for large numbers of manual annotations and improve the training
efficiency. We first pretrain a model with autogenerated samples from a
rule-based classifier so that human knowledge can be propagated into the
network. Based on the pretrained model, only a small set of annotations is
required for further fine-tuning. Quantitative experiments show that the
pretrained model achieves better performance than random initialization in
almost all cases; furthermore, our method can achieve similar performance with
fewer manual annotations.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Mei_J/0/1/0/all/0/1">Jilin Mei</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_H/0/1/0/all/0/1">Huijing Zhao</a>Non-finitely axiomatisable modal product logics with infinite canonical axiomatisations. (arXiv:1905.09536v1 [cs.LO])http://arxiv.org/abs/1905.09536
<p>Our concern is the axiomatisation problem for modal and algebraic logics that
correspond to various fragments of two-variable first-order logic with counting
quantifiers. In particular, we consider modal products with Diff, the
propositional unimodal logic of the difference operator. We show that the 2D
product logic Diff x Diff is non-finitely axiomatisable, but can be axiomatised
by infinitely many Sahlqvist axioms. We also show that its `square' version
(the modal counterpart of the substitution and equality free fragment of
two-variable first-order logic with counting to two) is non-finitely
axiomatisable over Diff x Diff, but can be axiomatised by adding infinitely
many Sahlqvist axioms. These are the first examples of products of finitely
axiomatisable modal logics that are not finitely axiomatisable, but
axiomatisable by explicit infinite sets of canonical axioms.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Hampson_C/0/1/0/all/0/1">Christopher Hampson</a>, <a href="http://arxiv.org/find/cs/1/au:+Kikot_S/0/1/0/all/0/1">Stanislav Kikot</a>, <a href="http://arxiv.org/find/cs/1/au:+Kurucz_A/0/1/0/all/0/1">Agi Kurucz</a>, <a href="http://arxiv.org/find/cs/1/au:+Marcelino_S/0/1/0/all/0/1">Sergio Marcelino</a>Detecting Malicious PowerShell Scripts Using Contextual Embeddings. (arXiv:1905.09538v1 [cs.CR])http://arxiv.org/abs/1905.09538
<p>PowerShell is a command line shell, that is widely used in organizations for
configuration management and task automation. Unfortunately, PowerShell is also
increasingly used by cybercriminals for launching cyber attacks against
organizations, mainly because it is pre-installed on Windows machines and it
exposes strong functionality that may be leveraged by attackers. This makes the
problem of detecting malicious PowerShell scripts both urgent and challenging.
</p>
<p>We address this important problem by presenting several novel deep learning
based detectors of malicious PowerShell scripts. Our best model obtains a true
positive rate of nearly 90% while maintaining a low false positive rate of less
than 0.1%, indicating that it can be of practical value.
</p>
<p>Our models employ pre-trained contextual embeddings of words from the
PowerShell "language". A contextual word embedding is able to project
semantically similar words to proximate vectors in the embedding space. A known
problem in the cybersecurity domain is that labeled data is relatively scarce
in comparison with unlabeled data, making it difficult to devise effective
supervised detection of malicious activity of many types. This is also the case
with PowerShell scripts. Our work shows that this problem can be largely
mitigated by learning a pre-trained contextual embedding based on unlabeled
data.
</p>
<p>We trained our models' embedding layer using a scripts dataset that was
enriched by a large corpus of unlabeled PowerShell scripts collected from
public repositories. As established by our performance analysis, the use of
unlabeled data for the embedding significantly improved the performance of our
detectors. We estimate that the usage of pre-trained contextual embeddings
based on unlabeled data for improved classification accuracy will find
additional applications in the cybersecurity domain.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Rubin_A/0/1/0/all/0/1">Amir Rubin</a>, <a href="http://arxiv.org/find/cs/1/au:+Kels_S/0/1/0/all/0/1">Shay Kels</a>, <a href="http://arxiv.org/find/cs/1/au:+Hendler_D/0/1/0/all/0/1">Danny Hendler</a>Recursive blocked algorithms for linear systems with Kronecker product structure. (arXiv:1905.09539v1 [math.NA])http://arxiv.org/abs/1905.09539
<p>Recursive blocked algorithms have proven to be highly efficient at the
numerical solution of the Sylvester matrix equation and its generalizations. In
this work, we show that these algorithms extend in a seamless fashion to
higher-dimensional variants of generalized Sylvester matrix equations, as they
arise from the discretization of PDEs with separable coefficients or the
approximation of certain models in macroeconomics. By combining recursions with
a mechanism for merging dimensions, an efficient algorithm is derived that
outperforms existing approaches based on Sylvester solvers.
</p>
<a href="http://arxiv.org/find/math/1/au:+Chen_M/0/1/0/all/0/1">Minhong Chen</a>, <a href="http://arxiv.org/find/math/1/au:+Kressner_D/0/1/0/all/0/1">Daniel Kressner</a>MemoryRanger Prevents Hijacking FILE_OBJECT Structures in Windows Kernel. (arXiv:1905.09543v1 [cs.CR])http://arxiv.org/abs/1905.09543
<p>Windows OS kernel memory is one of the main targets of cyber-attacks. By
launching such attacks, hackers are succeeding in process privilege escalation
and tampering with users data by accessing kernel mode memory. This paper
considers a new example of such an attack, which results in access to the files
opened in an exclusive mode. Windows built-in security features prevent such
legal access, but attackers can circumvent them by patching dynamically
allocated objects. The research shows that the Windows 10, version 1809 x64 is
vulnerable to this attack. The paper provides an example of using MemoryRanger,
a hypervisor-based solution to prevent such attack by running kernel-mode
drivers in isolated kernel memory enclaves.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Korkin_I/0/1/0/all/0/1">Igor Korkin</a>Computing Expected Runtimes for Constant Probability Programs. (arXiv:1905.09544v1 [cs.LO])http://arxiv.org/abs/1905.09544
<p>We introduce the class of constant probability (CP) programs and show that
classical results from probability theory directly yield a simple decision
procedure for (positive) almost sure termination of programs in this class.
Moreover, asymptotically tight bounds on their expected runtime can always be
computed easily. Based on this, we present an algorithm to infer the exact
expected runtime of any CP program.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Giesl_J/0/1/0/all/0/1">J&#xfc;rgen Giesl</a>, <a href="http://arxiv.org/find/cs/1/au:+Giesl_P/0/1/0/all/0/1">Peter Giesl</a>, <a href="http://arxiv.org/find/cs/1/au:+Hark_M/0/1/0/all/0/1">Marcel Hark</a>Replicated Vector Approximate Message Passing For Resampling Problem. (arXiv:1905.09545v1 [stat.ML])http://arxiv.org/abs/1905.09545
<p>Resampling techniques are widely used in statistical inference and ensemble
learning, in which estimators' statistical properties are essential. However,
existing methods are computationally demanding, because repetitions of
estimation/learning via numerical optimization/integral for each resampled data
are required. In this study, we introduce a computationally efficient method to
resolve such problem: replicated vector approximate message passing. This is
based on a combination of the replica method of statistical physics and an
accurate approximate inference algorithm, namely the vector approximate message
passing of information theory. The method provides tractable densities without
repeating estimation/learning, and the densities approximately offer an
arbitrary degree of the estimators' moment in practical time. In the
experiment, we apply the proposed method to the stability selection method,
which is commonly used in variable selection problems. The numerical results
show its fast convergence and high approximation accuracy for problems
involving both synthetic and real-world datasets.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Takahashi_T/0/1/0/all/0/1">Takashi Takahashi</a>, <a href="http://arxiv.org/find/stat/1/au:+Kabashima_Y/0/1/0/all/0/1">Yoshiyuki Kabashima</a>Revisiting Graph Neural Networks: All We Have is Low-Pass Filters. (arXiv:1905.09550v1 [stat.ML])http://arxiv.org/abs/1905.09550
<p>Graph neural networks have become one of the most important techniques to
solve machine learning problems on graph-structured data. Recent work on vertex
classification proposed deep and distributed learning models to achieve high
performance and scalability. However, we find that the feature vectors of
benchmark datasets are already quite informative for the classification task,
and the graph structure only provides a means to denoise the data. In this
paper, we develop a theoretical framework based on graph signal processing for
analyzing graph neural networks. Our results indicate that graph neural
networks only perform low-pass filtering on feature vectors and do not have the
non-linear manifold learning property. We further investigate their resilience
to feature noise and propose some insights on GCN-based graph neural network
design.
</p>
<a href="http://arxiv.org/find/stat/1/au:+NT_H/0/1/0/all/0/1">Hoang NT</a>, <a href="http://arxiv.org/find/stat/1/au:+Maehara_T/0/1/0/all/0/1">Takanori Maehara</a>Security of 5G-V2X: Technologies, Standardization and Research Directions. (arXiv:1905.09555v1 [cs.NI])http://arxiv.org/abs/1905.09555
<p>Cellular-Vehicle to Everything (C-V2X) aims at resolving issues pertaining to
the traditional usability of Vehicle to Infrastructure (V2I) and Vehicle to
Vehicle (V2V) networking. Especially, C-V2X lowers the number of entities
involved in vehicular communications and allows the inclusion of
cellular-security solutions to be applied to V2X. For this, the evolvement of
LTE-V2X is revolutionary, but it fails to handle the demands of high
throughput, ultra-high reliability, and ultra-low latency alongside its
security mechanisms. To counter this, 5G-V2X is considered as an integral
solution, which not only resolves the issues related to LTE-V2X but also
provides a function-based network setup. Several reports have been given for
the security of 5G, but none of them primarily focuses on the security of
5G-V2X. This article provides a detailed overview of 5G-V2X with a
security-based comparison with the LTE-V2X. A novel Security Reflex Function
(SRF)-based architecture is also proposed and several research challenges are
presented to be resolved in upcoming solutions related to the security of
5G-V2X. Alongside this, the article laid forward the requirements of
Ultra-Dense and Ultra-Secure (UD-US) transmissions in 5G-V2X.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Sharma_V/0/1/0/all/0/1">Vishal Sharma</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_Y/0/1/0/all/0/1">Yousik Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+You_I/0/1/0/all/0/1">Ilsun You</a>Knowledge Graph Embedding Bi-Vector Models for Symmetric Relation. (arXiv:1905.09557v1 [cs.AI])http://arxiv.org/abs/1905.09557
<p>Knowledge graph embedding (KGE) models have been proposed to improve the
performance of knowledge graph reasoning. However, there is a general
phenomenon in most of KGEs, as the training progresses, the symmetric relations
tend to zero vector, if the symmetric triples ratio is high enough in the
dataset. This phenomenon causes subsequent tasks, e.g. link prediction etc., of
symmetric relations to fail. The root cause of the problem is that KGEs do not
utilize the semantic information of symmetric relations. We propose KGE
bi-vector models, which represent the symmetric relations as vector pair,
significantly increasing the processing capability of the symmetry relations.
We generate the benchmark datasets based on FB15k and WN18 by completing the
symmetric relation triples to verify models. The experiment results of our
models clearly affirm the effectiveness and superiority of our models against
baseline.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Yao_J/0/1/0/all/0/1">Jinkui Yao</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_L/0/1/0/all/0/1">Lianghua Xu</a>MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions. (arXiv:1905.09558v1 [cs.LG])http://arxiv.org/abs/1905.09558
<p>Predicting interactions between structured entities lies at the core of
numerous tasks such as drug regimen and new material design. In recent years,
graph neural networks have become attractive. They represent structured
entities as graphs and then extract features from each individual graph using
graph convolution operations. However, these methods have some limitations: i)
their networks only extract features from a fix-sized subgraph structure (i.e.,
a fix-sized receptive field) of each node, and ignore features in substructures
of different sizes, and ii) features are extracted by considering each entity
independently, which may not effectively reflect the interaction between two
entities. To resolve these problems, we present MR-GNN, an end-to-end graph
neural network with the following features: i) it uses a multi-resolution based
architecture to extract node features from different neighborhoods of each
node, and, ii) it uses dual graph-state long short-term memory networks
(L-STMs) to summarize local features of each graph and extracts the interaction
features between pairwise graphs. Experiments conducted on real-world datasets
show that MR-GNN improves the prediction of state-of-the-art methods.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Xu_N/0/1/0/all/0/1">Nuo Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_P/0/1/0/all/0/1">Pinghui Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Chen_L/0/1/0/all/0/1">Long Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Tao_J/0/1/0/all/0/1">Jing Tao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_J/0/1/0/all/0/1">Junzhou Zhao</a>Binary Classification with Bounded Abstention Rate. (arXiv:1905.09561v1 [cs.LG])http://arxiv.org/abs/1905.09561
<p>We consider the problem of binary classification with abstention in the
relatively less studied \emph{bounded-rate} setting. We begin by obtaining a
characterization of the Bayes optimal classifier for an arbitrary input-label
distribution $P_{XY}$. Our result generalizes and provides an alternative proof
for the result first obtained by \cite{chow1957optimum}, and then re-derived by
\citet{denis2015consistency}, under a continuity assumption on $P_{XY}$. We
then propose a plug-in classifier that employs unlabeled samples to decide the
region of abstention and derive an upper-bound on the excess risk of our
classifier under standard \emph{H\"older smoothness} and \emph{margin}
assumptions. Unlike the plug-in rule of \citet{denis2015consistency}, our
constructed classifier satisfies the abstention constraint with high
probability and can also deal with discontinuities in the empirical cdf. We
also derive lower-bounds that demonstrate the minimax near-optimality of our
proposed algorithm. To address the excessive complexity of the plug-in
classifier in high dimensions, we propose a computationally efficient algorithm
that builds upon prior work on convex loss surrogates, and obtain bounds on its
excess risk in the \emph{realizable} case. We empirically compare the
performance of the proposed algorithm with a baseline on a number of UCI
benchmark datasets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Shekhar_S/0/1/0/all/0/1">Shubhanshu Shekhar</a>, <a href="http://arxiv.org/find/cs/1/au:+Ghavamzadeh_M/0/1/0/all/0/1">Mohammad Ghavamzadeh</a>, <a href="http://arxiv.org/find/cs/1/au:+Javidi_T/0/1/0/all/0/1">Tara Javidi</a>Recurrent Value Functions. (arXiv:1905.09562v1 [cs.LG])http://arxiv.org/abs/1905.09562
<p>Despite recent successes in Reinforcement Learning, value-based methods often
suffer from high variance hindering performance. In this paper, we illustrate
this in a continuous control setting where state of the art methods perform
poorly whenever sensor noise is introduced. To overcome this issue, we
introduce Recurrent Value Functions (RVFs) as an alternative to estimate the
value function of a state. We propose to estimate the value function of the
current state using the value function of past states visited along the
trajectory. Due to the nature of their formulation, RVFs have a natural way of
learning an emphasis function that selectively emphasizes important states.
First, we establish RVF's asymptotic convergence properties in tabular
settings. We then demonstrate their robustness on a partially observable domain
and continuous control tasks. Finally, we provide a qualitative interpretation
of the learned emphasis function.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Thodoroff_P/0/1/0/all/0/1">Pierre Thodoroff</a>, <a href="http://arxiv.org/find/cs/1/au:+Anand_N/0/1/0/all/0/1">Nishanth Anand</a>, <a href="http://arxiv.org/find/cs/1/au:+Caccia_L/0/1/0/all/0/1">Lucas Caccia</a>, <a href="http://arxiv.org/find/cs/1/au:+Precup_D/0/1/0/all/0/1">Doina Precup</a>, <a href="http://arxiv.org/find/cs/1/au:+Pineau_J/0/1/0/all/0/1">Joelle Pineau</a>ENIGMAWatch: ProofWatch Meets ENIGMA. (arXiv:1905.09565v1 [cs.AI])http://arxiv.org/abs/1905.09565
<p>In this work we describe a new learning-based proof guidance -- ENIGMAWatch
-- for saturation-style first-order theorem provers. ENIGMAWatch combines two
guiding approaches for the given-clause selection implemented for the E ATP
system: ProofWatch and ENIGMA. ProofWatch is motivated by the watchlist (hints)
method and based on symbolic matching of multiple related proofs, while ENIGMA
is based on statistical machine learning. The two methods are combined by using
the evolving information about symbolic proof matching as an additional
information that characterizes the saturation-style proof search for the
statistical learning methods. The new system is experimentally evaluated on a
large set of problems from the Mizar Library. We show that the added
proof-matching information is considered important by the statistical machine
learners, and that it leads to improvements in E's Performance over ProofWatch
and ENIGMA.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Goertzel_Z/0/1/0/all/0/1">Zarathustra Goertzel</a>, <a href="http://arxiv.org/find/cs/1/au:+Jakub%5Cr%7Bu%7Dv_J/0/1/0/all/0/1">Jan Jakub&#x16f;v</a>, <a href="http://arxiv.org/find/cs/1/au:+Urban_J/0/1/0/all/0/1">Josef Urban</a>Fire Now, Fire Later: Alarm-Based Systems for Prescriptive Process Monitoring. (arXiv:1905.09568v1 [cs.LG])http://arxiv.org/abs/1905.09568
<p>Predictive process monitoring is a family of techniques to analyze events
produced during the execution of a business process in order to predict the
future state or the final outcome of running process instances. Existing
techniques in this field are able to predict, at each step of a process
instance, the likelihood that it will lead to an undesired outcome.These
techniques, however, focus on generating predictions and do not prescribe when
and how process workers should intervene to decrease the cost of undesired
outcomes. This paper proposes a framework for prescriptive process monitoring,
which extends predictive monitoring with the ability to generate alarms that
trigger interventions to prevent an undesired outcome or mitigate its effect.
The framework incorporates a parameterized cost model to assess the
cost-benefit trade-off of generating alarms. We show how to optimize the
generation of alarms given an event log of past process executions and a set of
cost model parameters. The proposed approaches are empirically evaluated using
a range of real-life event logs. The experimental results show that the net
cost of undesired outcomes can be minimized by changing the threshold for
generating alarms, as the process instance progresses. Moreover, introducing
delays for triggering alarms, instead of triggering them as soon as the
probability of an undesired outcome exceeds a threshold, leads to lower net
costs.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Fahrenkrog_Petersen_S/0/1/0/all/0/1">Stephan A. Fahrenkrog-Petersen</a>, <a href="http://arxiv.org/find/cs/1/au:+Tax_N/0/1/0/all/0/1">Niek Tax</a>, <a href="http://arxiv.org/find/cs/1/au:+Teinemaa_I/0/1/0/all/0/1">Irene Teinemaa</a>, <a href="http://arxiv.org/find/cs/1/au:+Dumas_M/0/1/0/all/0/1">Marlon Dumas</a>, <a href="http://arxiv.org/find/cs/1/au:+Leoni_M/0/1/0/all/0/1">Massimiliano de Leoni</a>, <a href="http://arxiv.org/find/cs/1/au:+Maggi_F/0/1/0/all/0/1">Fabrizio Maria Maggi</a>, <a href="http://arxiv.org/find/cs/1/au:+Weidlich_M/0/1/0/all/0/1">Matthias Weidlich</a>Gravity-Inspired Graph Autoencoders for Directed Link Prediction. (arXiv:1905.09570v1 [cs.LG])http://arxiv.org/abs/1905.09570
<p>Graph autoencoders (AE) and variational autoencoders (VAE) recently emerged
as powerful node embedding methods. In particular, graph AE and VAE were
successfully leveraged to tackle the challenging link prediction problem,
aiming at figuring out whether some pairs of nodes from a graph are connected
by unobserved edges. However, these models focus on undirected graphs and
therefore ignore the potential direction of the link, which is limiting for
numerous real-life applications. In this paper, we extend the graph AE and VAE
frameworks to address link prediction in directed graphs. We present a new
gravity-inspired decoder scheme that can effectively reconstruct directed
graphs from a node embedding. We empirically evaluate our method on three
different directed link prediction tasks, for which standard graph AE and VAE
perform poorly. We achieve competitive results on three real-world graphs,
outperforming several popular baselines.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Salha_G/0/1/0/all/0/1">Guillaume Salha</a>, <a href="http://arxiv.org/find/cs/1/au:+Limnios_S/0/1/0/all/0/1">Stratis Limnios</a>, <a href="http://arxiv.org/find/cs/1/au:+Hennequin_R/0/1/0/all/0/1">Romain Hennequin</a>, <a href="http://arxiv.org/find/cs/1/au:+Tran_V/0/1/0/all/0/1">Viet Anh Tran</a>, <a href="http://arxiv.org/find/cs/1/au:+Vazirgiannis_M/0/1/0/all/0/1">Michalis Vazirgiannis</a>Kaleido: An Efficient Out-of-core Graph Mining System on A Single Machine. (arXiv:1905.09572v1 [cs.DC])http://arxiv.org/abs/1905.09572
<p>Graph mining is one of the most important categories of graph algorithms.
</p>
<p>However, exploring the subgraphs of an input graph produces a huge amount of
intermediate data.
</p>
<p>The 'think like a vertex' programming paradigm, pioneered by Pregel, cannot
readily formulate mining problems, which is designed to produce graph
computation problems like PageRank.
</p>
<p>Existing mining systems like Arabesque and RStream need large amounts of
computing and memory resources.
</p>
<p>In this paper, we present Kaleido, an efficient single machine, out-of-core
graph mining system which treats disks as an extension of memory.
</p>
<p>Kaleido treats intermediate data in graph mining tasks as a tensor and adopts
a succinct data structure for the intermediate data.
</p>
<p>Kaleido utilizes the eigenvalue of the adjacency matrix of a subgraph to
efficiently solve the subgraph isomorphism problems with an acceptable
constraint that the vertex number of a subgraph is less than 9.
</p>
<p>Kaleido implements half-memory-half-disk storage for storing large
intermediate data, which treats the disk as an extension of the memory.
</p>
<p>Comparing with two state-of-the-art mining systems, Arabesque and RStream,
Kaleido outperforms them by a GeoMean 12.3$\times$ and 40.0$\times$
respectively.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhao_C/0/1/0/all/0/1">Cheng Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Z/0/1/0/all/0/1">Zhibin Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_P/0/1/0/all/0/1">Peng Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Zheng_T/0/1/0/all/0/1">Tianqi Zheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Cheng_X/0/1/0/all/0/1">Xueqi Cheng</a>Improving Neural Networks by Adopting Amplifying and Attenuating Neurons. (arXiv:1905.09574v1 [cs.NE])http://arxiv.org/abs/1905.09574
<p>In the present study, an amplifying neuron and attenuating neuron, which can
be easily implemented into neural networks without any significant additional
computational effort, are proposed. The activated output value is squared for
the amplifying neuron, while the value becomes its reciprocal for the
attenuating one. Theoretically, the order of neural networks increases when the
amplifying neuron is placed in the hidden layer. The performance assessments of
neural networks were conducted to verify that the amplifying and attenuating
neurons enhance the performance of neural networks. From the numerical
experiments, it was revealed that the neural networks that contain the
amplifying and attenuating neurons yield more accurate results, compared to
those without them.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Jung_S/0/1/0/all/0/1">Seongmun Jung</a>, <a href="http://arxiv.org/find/cs/1/au:+Kwon_O/0/1/0/all/0/1">Oh Joon Kwon</a>Network Slicing for Vehicular Communication. (arXiv:1905.09578v1 [cs.NI])http://arxiv.org/abs/1905.09578
<p>Ultra-reliable vehicle-to-everything (V2X) communication is essential for
enabling the next generation of intelligent vehicles. V2X communication is a
growing area of communication, that connects vehicles to neighboring vehicles
(V2V), infrastructure (V2I) and pedestrians (V2P). Network slicing is one of
the promising technology for connectivity of the next generation devices,
creating several logical networks on a common and programmable physical
infrastructure. Network slicing offers an efficient way to satisfy the diverse
use case requirements by exploiting the benefits of shared physical
infrastructure. In this regard, we propose a network slicing based
communication solution for vehicular networks. In this work, we model a highway
scenario with vehicles having heterogeneous traffic demands. The autonomous
driving slice (safety messages) and the infotainment slice (video stream) are
the two logical slices created on a common infrastructure. We formulated a
network clustering and slicing algorithm to partition the vehicles into
different clusters and allocate slice leaders to each cluster. Slice leaders
serve its clustered vehicles with high quality V2V links and forwards safety
information with low latency. On the other hand, road side unit provides
infotainment service using high quality V2I links. An extensive Long Term
Evolution Advanced (LTE-A) system level simulator with enhancement of cellular
V2X (C-V2X) standard is used to evaluate the performance of the proposed
method, in which it is shown that the proposed network slicing technique
achieves low latency and high-reliability communication.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Khan_H/0/1/0/all/0/1">Hamza Khan</a>, <a href="http://arxiv.org/find/cs/1/au:+Luoto_P/0/1/0/all/0/1">Petri Luoto</a>, <a href="http://arxiv.org/find/cs/1/au:+Samarakoon_S/0/1/0/all/0/1">Sumudu Samarakoon</a>, <a href="http://arxiv.org/find/cs/1/au:+Bennis_M/0/1/0/all/0/1">Mehdi Bennis</a>, <a href="http://arxiv.org/find/cs/1/au:+Latva_Aho_M/0/1/0/all/0/1">Matti Latva-Aho</a>Beyond Cookie Monster Amnesia: Real World Persistent Online Tracking. (arXiv:1905.09581v1 [cs.CR])http://arxiv.org/abs/1905.09581
<p>Browser fingerprinting is a relatively new method of uniquely identifying
browsers that can be used to track web users. In some ways it is more
privacy-threatening than tracking via cookies, as users have no direct control
over it. A number of authors have considered the wide variety of techniques
that can be used to fingerprint browsers; however, relatively little
information is available on how widespread browser fingerprinting is, and what
information is collected to create these fingerprints in the real world. To
help address this gap, we crawled the 10,000 most popular websites; this gave
insights into the number of websites that are using the technique, which
websites are collecting fingerprinting information, and exactly what
information is being retrieved. We found that approximately 69\% of websites
are, potentially, involved in first-party or third-party browser
fingerprinting. We further found that third-party browser fingerprinting, which
is potentially more privacy-damaging, appears to be predominant in practice. We
also describe \textit{FingerprintAlert}, a freely available browser extension
we developed that detects and, optionally, blocks fingerprinting attempts by
visited websites.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Al_Fannah_N/0/1/0/all/0/1">Nasser Mohammed Al-Fannah</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_W/0/1/0/all/0/1">Wanpeng Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Mitchell_C/0/1/0/all/0/1">Chris J Mitchell</a>Underwater Stereo using Refraction-free Image Synthesized from Light Field Camera. (arXiv:1905.09588v1 [eess.IV])http://arxiv.org/abs/1905.09588
<p>There is a strong demand on capturing underwater scenes without distortions
caused by refraction. Since a light field camera can capture several light rays
at each point of an image plane from various directions, if geometrically
correct rays are chosen, it is possible to synthesize a refraction-free image.
In this paper, we propose a novel technique to efficiently select such rays to
synthesize a refraction-free image from an underwater image captured by a light
field camera. In addition, we propose a stereo technique to reconstruct 3D
shapes using a pair of our refraction-free images, which are central
projection. In the experiment, we captured several underwater scenes by two
light field cameras, synthesized refraction free images and applied stereo
technique to reconstruct 3D shapes. The results are compared with previous
techniques which are based on approximation, showing the strength of our
method.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Ichimaru_K/0/1/0/all/0/1">Kazuto Ichimaru</a>, <a href="http://arxiv.org/find/eess/1/au:+Kawasaki_H/0/1/0/all/0/1">Hiroshi Kawasaki</a>Glioma Grade Predictions using Scattering Wavelet Transform-Based Radiomics. (arXiv:1905.09589v1 [cs.LG])http://arxiv.org/abs/1905.09589
<p>Glioma grading before the surgery is very critical for the prognosis
prediction and treatment plan making. In this paper, we present a novel
scattering wavelet-based radiomics method to predict noninvasively and
accurately the glioma grades. The multimodal magnetic resonance images of 285
patients were used, with the intratumoral and peritumoral regions well labeled.
The wavelet scattering-based features and traditional radiomics features were
firstly extracted from both intratumoral and peritumoral regions respectively.
The support vector machine (SVM), logistic regression (LR) and random forest
(RF) were then trained with 5-fold cross validation to predict the glioma
grades. The prediction obtained with different features was finally evaluated
in terms of quantitative metrics. The area under the receiver operating
characteristic curve (AUC) of glioma grade prediction based on scattering
wavelet features was up to 0.99 when considering both intratumoral and
peritumoral features in multimodal images, which increases by about 17%
compared to traditional radiomics. Such results shown that the local invariant
features extracted from the scattering wavelet transform allows improving the
prediction accuracy for glioma grading. In addition, the features extracted
from peritumoral regions further increases the accuracy of glioma grading.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chen_Q/0/1/0/all/0/1">Qijian Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_L/0/1/0/all/0/1">Lihui Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_L/0/1/0/all/0/1">Li Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Deng_Z/0/1/0/all/0/1">Zeyu Deng</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_J/0/1/0/all/0/1">Jian Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhu_Y/0/1/0/all/0/1">Yuemin Zhu</a>Topological Characterization of Consensus under General Message Adversaries. (arXiv:1905.09590v1 [cs.DC])http://arxiv.org/abs/1905.09590
<p>In this paper, we provide a rigorous characterization of consensus
solvability in synchronous directed dynamic networks controlled by an arbitrary
message adversary using point-set topology: We extend the approach introduced
by Alpern and Schneider in 1985 by introducing two novel topologies on the
space of infinite executions: the process-view topology, induced by a distance
function that relies on the local view of a given process in an execution, and
the minimum topology, which is induced by a distance function that focuses on
the local view of the process that is the last to distinguish two executions.
We establish some simple but powerful topological results, which not only lead
to a topological explanation of bivalence arguments, but also provide necessary
and sufficient topological conditions on the admissible graph sequences of a
message adversary for solving consensus. In particular, we characterize
consensus solvability in terms of connectivity of the set of admissible graph
sequences. For non-compact message adversaries, which are not limit-closed in
the sense that there is a convergent sequence of graph sequences whose limit is
not permitted, this requires the exclusion of all "fair" and "unfair" limit
sequences that coincide with the forever bivalent runs constructed in bivalence
proofs. For both compact and non-compact message adversaries, we also provide
tailored characterizations of consensus solvability, i.e., tight conditions for
impossibility and existence of algorithms, based on the broadcastability of the
connected components of the set of admissible graph sequences.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Nowak_T/0/1/0/all/0/1">Thomas Nowak</a>, <a href="http://arxiv.org/find/cs/1/au:+Schmid_U/0/1/0/all/0/1">Ulrich Schmid</a>, <a href="http://arxiv.org/find/cs/1/au:+Winkler_K/0/1/0/all/0/1">Kyrill Winkler</a>A Direct Approach to Robust Deep Learning Using Adversarial Networks. (arXiv:1905.09591v1 [cs.CV])http://arxiv.org/abs/1905.09591
<p>Deep neural networks have been shown to perform well in many classical
machine learning problems, especially in image classification tasks. However,
researchers have found that neural networks can be easily fooled, and they are
surprisingly sensitive to small perturbations imperceptible to humans.
Carefully crafted input images (adversarial examples) can force a well-trained
neural network to provide arbitrary outputs. Including adversarial examples
during training is a popular defense mechanism against adversarial attacks. In
this paper we propose a new defensive mechanism under the generative
adversarial network (GAN) framework. We model the adversarial noise using a
generative network, trained jointly with a classification discriminative
network as a minimax game. We show empirically that our adversarial network
approach works well against black box attacks, with performance on par with
state-of-art methods such as ensemble adversarial training and adversarial
training with projected gradient descent.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Wang_H/0/1/0/all/0/1">Huaxia Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Yu_C/0/1/0/all/0/1">Chun-Nam Yu</a>Non-monotone DR-submodular Maximization: Approximation and Regret Guarantees. (arXiv:1905.09595v1 [cs.LG])http://arxiv.org/abs/1905.09595
<p>Diminishing-returns (DR) submodular optimization is an important field with
many real-world applications in machine learning, economics and communication
systems. It captures a subclass of non-convex optimization that provides both
practical and theoretical guarantees. In this paper, we study the fundamental
problem of maximizing non-monotone DR-submodular functions over down-closed and
general convex sets in both offline and online settings. First, we show that
for offline maximizing non-monotone DR-submodular functions over a general
convex set, the Frank-Wolfe algorithm achieves an approximation guarantee which
depends on the convex set. Next, we show that the Stochastic Gradient Ascent
algorithm achieves a 1/4-approximation ratio with the regret of $O(1/\sqrt{T})$
for the problem of maximizing non-monotone DR-submodular functions over
down-closed convex sets. These are the first approximation guarantees in the
corresponding settings. Finally we benchmark these algorithms on problems
arising in machine learning domain with the real-world datasets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Durr_C/0/1/0/all/0/1">Christoph D&#xfc;rr</a>, <a href="http://arxiv.org/find/cs/1/au:+Thang_N/0/1/0/all/0/1">Nguyen Kim Thang</a>, <a href="http://arxiv.org/find/cs/1/au:+Srivastav_A/0/1/0/all/0/1">Abhinav Srivastav</a>, <a href="http://arxiv.org/find/cs/1/au:+Tible_L/0/1/0/all/0/1">L&#xe9;o Tible</a>Variational Inference with Mixture Model Approximation: Robotic Applications. (arXiv:1905.09597v1 [cs.RO])http://arxiv.org/abs/1905.09597
<p>We propose a method to approximate the distribution of robot configurations
satisfying multiple objectives. Our approach uses Variational Inference, a
popular method in Bayesian computation, which has several advantages over
sampling-based techniques. To be able to represent the complex and multimodal
distribution of configurations, we propose to use a mixture model as
approximate distribution, an approach that has gained popularity recently. In
this work, we show the interesting properties of this approach and how it can
be applied to a range of problems.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pignat_E/0/1/0/all/0/1">Emmanuel Pignat</a>, <a href="http://arxiv.org/find/cs/1/au:+Lembono_T/0/1/0/all/0/1">Teguh Lembono</a>, <a href="http://arxiv.org/find/cs/1/au:+Calinon_S/0/1/0/all/0/1">Sylvain Calinon</a>CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM. (arXiv:1905.09598v1 [cs.NE])http://arxiv.org/abs/1905.09598
<p>With the widespread use of social media, companies now have access to a
wealth of customer feedback data which has valuable applications to Customer
Relationship Management (CRM). Analyzing customer grievances data, is paramount
as their speedy non-redressal would lead to customer churn resulting in lower
profitability. In this paper, we propose a descriptive analytics framework
using Self-organizing feature map (SOM), for Visual Sentiment Analysis of
customer complaints. The network learns the inherent grouping of the complaints
automatically which can then be visualized too using various techniques.
Analytical Customer Relationship Management (ACRM) executives can draw useful
business insights from the maps and take timely remedial action. We also
propose a high-performance version of the algorithm CUDASOM (CUDA based Self
Organizing feature Map) implemented using NVIDIA parallel computing platform,
CUDA, which speeds up the processing of high-dimensional text data and
generates fast results. The efficacy of the proposed model has been
demonstrated on the customer complaints data regarding the products and
services of four leading Indian banks. CUDASOM achieved an average speed up of
44 times. Our approach can expand research into intelligent grievance redressal
system to provide rapid solutions to the complaining customers.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Gavval_R/0/1/0/all/0/1">Rohit Gavval</a>, <a href="http://arxiv.org/find/cs/1/au:+Ravi_V/0/1/0/all/0/1">Vadlamani Ravi</a>, <a href="http://arxiv.org/find/cs/1/au:+Harshal_K/0/1/0/all/0/1">Kalavala Revanth Harshal</a>, <a href="http://arxiv.org/find/cs/1/au:+Gangwar_A/0/1/0/all/0/1">Akhilesh Gangwar</a>, <a href="http://arxiv.org/find/cs/1/au:+Ravi_K/0/1/0/all/0/1">Kumar Ravi</a>Diffusion and Auction on Graphs. (arXiv:1905.09604v1 [cs.GT])http://arxiv.org/abs/1905.09604
<p>Auction is the common paradigm for resource allocation which is a fundamental
problem in human society. Existing research indicates that the two primary
objectives, the seller's revenue and the allocation efficiency, are generally
conflicting in auction design. For the first time, we expand the domain of the
classic auction to a social graph and formally identify a new class of auction
mechanisms on graphs. All mechanisms in this class are incentive-compatible and
also promote all buyers to diffuse the auction information to others, whereby
both the seller's revenue and the allocation efficiency are significantly
improved comparing with the Vickrey auction. It is found that the recently
proposed information diffusion mechanism is an extreme case with the lowest
revenue in this new class. Our work could potentially inspire a new perspective
for the efficient and optimal auction design and could be applied into the
prevalent online social and economic networks.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_B/0/1/0/all/0/1">Bin Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Hao_D/0/1/0/all/0/1">Dong Hao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_D/0/1/0/all/0/1">Dengji Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Yokoo_M/0/1/0/all/0/1">Makoto Yokoo</a>Johnson-Lindenstrauss Property Implies Subspace Restricted Isometry Property. (arXiv:1905.09608v1 [cs.IT])http://arxiv.org/abs/1905.09608
<p>Dimensionality reduction is a popular approach to tackle high-dimensional
data with low-dimensional nature. Subspace Restricted Isometry Property, a
newly-proposed concept, has proved to be a useful tool in analyzing the effect
of dimensionality reduction algorithms on subspaces. In this paper, we
establish the subspace Restricted Isometry Property for random projections
satisfying some specific concentration inequality, which is called by
Johnson-Lindenstrauss property. Johnson-Lindenstrauss property is a very mild
condition and is satisfied by numerous types of random matrices encountered in
practice. Thus our result could extend the applicability of random projections
in subspace-based machine learning algorithms including subspace clustering and
allow for the usage of, for instance, Bernoulli matrices, partial Fourier
matrices, and partial Hadamard matrices for random projections, which are
easier to implement on hardware or are more efficient to compute.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Xv_X/0/1/0/all/0/1">Xingyu Xv</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_G/0/1/0/all/0/1">Gen Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Gu_Y/0/1/0/all/0/1">Yuantao Gu</a>Hypothetical answers to continuous queries over data streams. (arXiv:1905.09610v1 [cs.PL])http://arxiv.org/abs/1905.09610
<p>Continuous queries over data streams may suffer from blocking operations
and/or unbound wait, which may delay answers until some relevant input arrives
through the data stream. These delays may turn answers, when they arrive,
obsolete to users who sometimes have to make decisions with no help whatsoever.
Therefore, it can be useful to provide hypothetical answers - "given the
current information, it is possible that X will become true at time t" -
instead of no information at all.
</p>
<p>In this paper we present a semantics for queries and corresponding answers
that covers such hypothetical answers, together with an online algorithm for
updating the set of facts that are consistent with the currently available
information.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cruz_Filipe_L/0/1/0/all/0/1">Lu&#xed;s Cruz-Filipe</a>, <a href="http://arxiv.org/find/cs/1/au:+Gaspar_G/0/1/0/all/0/1">Gra&#xe7;a Gaspar</a>, <a href="http://arxiv.org/find/cs/1/au:+Nunes_I/0/1/0/all/0/1">Isabel Nunes</a>Learning the Representations of Moist Convection with Convolutional Neural Networks. (arXiv:1905.09614v1 [physics.ao-ph])http://arxiv.org/abs/1905.09614
<p>The representations of atmospheric moist convection in general circulation
models have been one of the most challenging tasks due to its complexity in
physical processes, and the interaction between processes under different
time/spatial scales. This study proposes a new method to predict the effects of
moist convection on the environment using convolutional neural networks. With
the help of considering the gradient of physical fields between adjacent grids
in the grey zone resolution, the effects of moist convection predicted by the
convolutional neural networks are more realistic compared to the effects
predicted by other machine learning models. The result also suggests that the
method proposed in this study has the potential to replace the conventional
cumulus parameterization in the general circulation models.
</p>
<a href="http://arxiv.org/find/physics/1/au:+Tsou_S/0/1/0/all/0/1">Shih-Wen Tsou</a>, <a href="http://arxiv.org/find/physics/1/au:+Su_C/0/1/0/all/0/1">Chun-Yian Su</a>, <a href="http://arxiv.org/find/physics/1/au:+Wu_C/0/1/0/all/0/1">Chien-Ming Wu</a>A Comparative Study of Analog/Digital Self-Interference Cancellation for Full Duplex Radios. (arXiv:1905.09616v1 [eess.SP])http://arxiv.org/abs/1905.09616
<p>Self-interference (SI) is the main obstacle to full-duplex radios. To
overcome the SI, researchers have proposed several analog and digital domain
self-interference cancellation (SIC) techniques. How well the digital
cancellation works depends on the results of analog cancellation. Therefore, to
analyze overall SIC performance, one should do so in an integrated manner. In
this paper, we build a simulator that can analyze the performance of analog and
digital SIC techniques. Through this simulator, we can analyze the overall SIC
performance within various system parameters such as the resolution of an
analog-to-digital converter (ADC) and/or nonlinearity of a power amplifier
(PA). With our simulator, we expect that configurations and tuning algorithms
of an active analog canceller can be optimized before real hardware
implementation.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Kwak_J/0/1/0/all/0/1">Jong Woo Kwak</a>, <a href="http://arxiv.org/find/eess/1/au:+Sim_M/0/1/0/all/0/1">Min Soo Sim</a>, <a href="http://arxiv.org/find/eess/1/au:+Kang_I/0/1/0/all/0/1">In-Woong Kang</a>, <a href="http://arxiv.org/find/eess/1/au:+Park_J/0/1/0/all/0/1">Jong Sung Park</a>, <a href="http://arxiv.org/find/eess/1/au:+Park_J/0/1/0/all/0/1">Jaedon Park</a>, <a href="http://arxiv.org/find/eess/1/au:+Chae_C/0/1/0/all/0/1">Chan-Byoung Chae</a>Automatic Generation of Level Maps with the Do What's Possible Representation. (arXiv:1905.09618v1 [cs.AI])http://arxiv.org/abs/1905.09618
<p>Automatic generation of level maps is a popular form of automatic content
generation. In this study, a recently developed technique employing the {\em do
what's possible} representation is used to create open-ended level maps.
Generation of the map can continue indefinitely, yielding a highly scalable
representation. A parameter study is performed to find good parameters for the
evolutionary algorithm used to locate high-quality map generators. Variations
on the technique are presented, demonstrating its versatility, and an
algorithmic variant is given that both improves performance and changes the
character of maps located. The ability of the map to adapt to different regions
where the map is permitted to occupy space are also tested.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ashlock_D/0/1/0/all/0/1">Daniel Ashlock</a>, <a href="http://arxiv.org/find/cs/1/au:+Salge_C/0/1/0/all/0/1">Christoph Salge</a>COBS: a Compact Bit-Sliced Signature Index. (arXiv:1905.09624v1 [cs.DB])http://arxiv.org/abs/1905.09624
<p>We present COBS, a compact bit-sliced signature index, which is a cross-over
between an inverted index and Bloom filters. Our target application is to index
$k$-mers of DNA samples or $q$-grams from text documents and process
approximate pattern matching queries on the corpus with a user-chosen coverage
threshold. Query results may contain a number of false positives which
decreases exponentially with the query length and the false positive rate of
the index determined at construction time. We compare COBS to seven other index
software packages on 100 000 microbial DNA samples. COBS' compact but simple
data structure outperforms the other indexes in construction time and query
performance with Mantis by Pandey et al. on second place. However, different
from Mantis and other previous work, COBS does not need the complete index in
RAM and is thus designed to scale to larger document sets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Bingmann_T/0/1/0/all/0/1">Timo Bingmann</a>, <a href="http://arxiv.org/find/cs/1/au:+Bradley_P/0/1/0/all/0/1">Phelim Bradley</a>, <a href="http://arxiv.org/find/cs/1/au:+Gauger_F/0/1/0/all/0/1">Florian Gauger</a>, <a href="http://arxiv.org/find/cs/1/au:+Iqbal_Z/0/1/0/all/0/1">Zamin Iqbal</a>An Improved Reversible Data Hiding in Encrypted Images using Parametric Binary Tree Labeling. (arXiv:1905.09625v1 [cs.MM])http://arxiv.org/abs/1905.09625
<p>This work proposes an improved reversible data hiding scheme in encrypted
images using parametric binary tree labeling(IPBTL-RDHEI), which takes
advantage of the spatial correlation in the entire original image not in small
image blocks to reserve room for embedding data before image encryption, then
the original image is encrypted with a secret key and parametric binary tree
labeling is used to label image pixels in two different categories. According
to the experimental results, compared with several state-of-the-art methods,
the proposed IPBTL-RDHEI method achieves higher embedding rate and outperforms
the competitors. Due to the reversibility of IPBTL-RDHEI, the original content
of the image and the secret information can be restored and extracted
losslessly and separately.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Wu_Y/0/1/0/all/0/1">Youqing Wu</a>, <a href="http://arxiv.org/find/cs/1/au:+Xiang_Y/0/1/0/all/0/1">Youzhi Xiang</a>, <a href="http://arxiv.org/find/cs/1/au:+Guo_Y/0/1/0/all/0/1">Yutang Guo</a>, <a href="http://arxiv.org/find/cs/1/au:+Tang_J/0/1/0/all/0/1">Jin Tang</a>, <a href="http://arxiv.org/find/cs/1/au:+Yin_Z/0/1/0/all/0/1">Zhaoxia Yin</a>Robust Point Cloud Based Reconstruction of Large-Scale Outdoor Scenes. (arXiv:1905.09634v1 [cs.CV])http://arxiv.org/abs/1905.09634
<p>Outlier feature matches and loop-closures that survived front-end data
association can lead to catastrophic failures in the back-end optimization of
large-scale point cloud based 3D reconstruction. To alleviate this problem, we
propose a probabilistic approach for robust back-end optimization in the
presence of outliers. More specifically, we model the problem as a Bayesian
network and solve it using the Expectation-Maximization algorithm. Our approach
leverages on a long-tail Cauchy distribution to suppress outlier feature
matches in the odometry constraints, and a Cauchy-Uniform mixture model with a
set of binary latent variables to simultaneously suppress outlier loop-closure
constraints and outlier feature matches in the inlier loop-closure constraints.
Furthermore, we show that by using a Gaussian-Uniform mixture model, our
approach degenerates to the formulation of a state-of-the-art approach for
robust indoor reconstruction. Experimental results demonstrate that our
approach has comparable performance with the state-of-the-art on a benchmark
indoor dataset, and outperforms it on a large-scale outdoor dataset. Our source
code can be found on the project website.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Lan_Z/0/1/0/all/0/1">Ziquan Lan</a>, <a href="http://arxiv.org/find/cs/1/au:+Yew_Z/0/1/0/all/0/1">Zi Jian Yew</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_G/0/1/0/all/0/1">Gim Hee Lee</a>Tucker Decomposition Network: Expressive Power and Comparison. (arXiv:1905.09635v1 [cs.LG])http://arxiv.org/abs/1905.09635
<p>Deep neural networks have achieved a great success in solving many machine
learning and computer vision problems. The main contribution of this paper is
to develop a deep network based on Tucker tensor decomposition, and analyze its
expressive power. It is shown that the expressiveness of Tucker network is more
powerful than that of shallow network. In general, it is required to use an
exponential number of nodes in a shallow network in order to represent a Tucker
network. Experimental results are also given to compare the performance of the
proposed Tucker network with hierarchical tensor network and shallow network,
and demonstrate the usefulness of Tucker network in image classification
problems.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Liu_Y/0/1/0/all/0/1">Ye Liu</a>, <a href="http://arxiv.org/find/cs/1/au:+Pan_J/0/1/0/all/0/1">Junjun Pan</a>, <a href="http://arxiv.org/find/cs/1/au:+Ng_M/0/1/0/all/0/1">Michael Ng</a>Estimating Risk and Uncertainty in Deep Reinforcement Learning. (arXiv:1905.09638v1 [cs.LG])http://arxiv.org/abs/1905.09638
<p>This paper demonstrates a novel method for separately estimating aleatoric
risk and epistemic uncertainty in deep reinforcement learning. Aleatoric risk,
which arises from inherently stochastic environments or agents, must be
accounted for in the design of risk-sensitive algorithms. Epistemic
uncertainty, which stems from limited data, is important both for
risk-sensitivity and to efficiently explore an environment. We first present a
Bayesian framework for learning the return distribution in reinforcement
learning, which provides theoretical foundations for quantifying both types of
uncertainty. Based on this framework, we show that the disagreement between
only two neural networks is sufficient to produce a low-variance estimate of
the epistemic uncertainty on the return distribution, thus providing a simple
and computationally cheap uncertainty metric. We demonstrate experiments that
illustrate our method and some applications.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Clements_W/0/1/0/all/0/1">William R. Clements</a>, <a href="http://arxiv.org/find/cs/1/au:+Robaglia_B/0/1/0/all/0/1">Beno&#xee;t-Marie Robaglia</a>, <a href="http://arxiv.org/find/cs/1/au:+Delft_B/0/1/0/all/0/1">Bastien Van Delft</a>, <a href="http://arxiv.org/find/cs/1/au:+Slaoui_R/0/1/0/all/0/1">Reda Bahi Slaoui</a>, <a href="http://arxiv.org/find/cs/1/au:+Toth_S/0/1/0/all/0/1">S&#xe9;bastien Toth</a>An Investigation of Transfer Learning-Based Sentiment Analysis in Japanese. (arXiv:1905.09642v1 [cs.CL])http://arxiv.org/abs/1905.09642
<p>Text classification approaches have usually required task-specific model
architectures and huge labeled datasets. Recently, thanks to the rise of
text-based transfer learning techniques, it is possible to pre-train a language
model in an unsupervised manner and leverage them to perform effective on
downstream tasks. In this work we focus on Japanese and show the potential use
of transfer learning techniques in text classification. Specifically, we
perform binary and multi-class sentiment classification on the Rakuten product
review and Yahoo movie review datasets. We show that transfer learning-based
approaches perform better than task-specific models trained on 3 times as much
data. Furthermore, these approaches perform just as well for language modeling
pre-trained on only 1/30 of the data. We release our pre-trained models and
code as open source.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Bataa_E/0/1/0/all/0/1">Enkhbold Bataa</a>, <a href="http://arxiv.org/find/cs/1/au:+Wu_J/0/1/0/all/0/1">Joshua Wu</a>Scientific Programs Imply Uncertainty. Results Expected and Unexpected. (arXiv:1905.09644v1 [cs.HC])http://arxiv.org/abs/1905.09644
<p>Science and engineering have requests for a wide variety of programs, but I
think that all of them can be divided between two groups. Programs of the first
group deal with the well known situations and, by using well known equations,
give results for any combination of input parameters. Such programs are
specialized very powerful calculators. Another group of programs is needed to
analyse the situations with different levels of uncertainty. Programs are
developed at the best level of their authors, but scientists need to look at
the situations beyond the area of current knowledge, and they need programs to
do analysis in the areas of uncertainty. Is it possible do design programs
which allow to analyse the situations beyond the knowledge of developers?
</p>
<a href="http://arxiv.org/find/cs/1/au:+Andreyev_S/0/1/0/all/0/1">Sergey Andreyev</a>Image Fusion via Sparse Regularization with Non-Convex Penalties. (arXiv:1905.09645v1 [cs.CV])http://arxiv.org/abs/1905.09645
<p>The L1 norm regularized least squares method is often used for finding sparse
approximate solutions and is widely used in 1-D signal restoration. Basis
pursuit denoising (BPD) performs noise reduction in this way. However, the
shortcoming of using L1 norm regularization is the underestimation of the true
solution. Recently, a class of non-convex penalties have been proposed to
improve this situation. This kind of penalty function is non-convex itself, but
preserves the convexity property of the whole cost function. This approach has
been confirmed to offer good performance in 1-D signal denoising. This paper
demonstrates the aforementioned method to 2-D signals (images) and applies it
to multisensor image fusion. The problem is posed as an inverse one and a
corresponding cost function is judiciously designed to include two data
attachment terms. The whole cost function is proved to be convex upon suitably
choosing the non-convex penalty, so that the cost function minimization can be
tackled by convex optimization approaches, which comprise simple computations.
The performance of the proposed method is benchmarked against a number of
state-of-the-art image fusion techniques and superior performance is
demonstrated both visually and in terms of various assessment measures.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Anantrasirichai_N/0/1/0/all/0/1">Nantheera Anantrasirichai</a>, <a href="http://arxiv.org/find/cs/1/au:+Zheng_R/0/1/0/all/0/1">Rencheng Zheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Selesnick_I/0/1/0/all/0/1">Ivan Selesnick</a>, <a href="http://arxiv.org/find/cs/1/au:+Achim_A/0/1/0/all/0/1">Alin Achim</a>Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks. (arXiv:1905.09646v1 [cs.CV])http://arxiv.org/abs/1905.09646
<p>The Convolutional Neural Networks (CNNs) generate the feature representation
of complex objects by collecting hierarchical and different parts of semantic
sub-features. These sub-features can usually be distributed in grouped form in
the feature vector of each layer, representing various semantic entities.
However, the activation of these sub-features is often spatially affected by
similar patterns and noisy backgrounds, resulting in erroneous localization and
identification. We propose a Spatial Group-wise Enhance (SGE) module that can
adjust the importance of each sub-feature by generating an attention factor for
each spatial location in each semantic group, so that every individual group
can autonomously enhance its learnt expression and suppress possible noise. The
attention factors are only guided by the similarities between the global and
local feature descriptors inside each group, thus the design of SGE module is
extremely lightweight with \emph{almost no extra parameters and calculations}.
Despite being trained with only category supervisions, the SGE component is
extremely effective in highlighting multiple active areas with various
high-order semantics (such as the dog's eyes, nose, etc.). When integrated with
popular CNN backbones, SGE can significantly boost the performance of image
recognition tasks. Specifically, based on ResNet50 backbones, SGE achieves
1.2\% Top-1 accuracy improvement on the ImageNet benchmark and 1.0$\sim$2.0\%
AP gain on the COCO benchmark across a wide range of detectors
(Faster/Mask/Cascade RCNN and RetinaNet). Codes and pretrained models are
available at https://github.com/implus/PytorchInsight.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_X/0/1/0/all/0/1">Xiang Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Hu_X/0/1/0/all/0/1">Xiaolin Hu</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_J/0/1/0/all/0/1">Jian Yang</a>New methods for SVM feature selection. (arXiv:1905.09653v1 [cs.LG])http://arxiv.org/abs/1905.09653
<p>Support Vector Machines have been a popular topic for quite some time now,
and as they develop, a need for new methods of feature selection arises. This
work presents various approaches SVM feature selection developped using new
tools such as entropy measurement and K-medoid clustering. The work focuses on
the use of one-class SVM's for wafer testing, with a numerical implementation
in R.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Aladjidi_T/0/1/0/all/0/1">Tangui Aladjidi</a>, <a href="http://arxiv.org/find/cs/1/au:+Pasqualini_F/0/1/0/all/0/1">Fran&#xe7;ois Pasqualini</a>A ROS2 based communication architecture for control in collaborative and intelligent automation systems. (arXiv:1905.09654v1 [cs.RO])http://arxiv.org/abs/1905.09654
<p>Collaborative robots are becoming part of intelligent automation systems in
modern industry. Development and control of such systems differs from
traditional automation methods and consequently leads to new challenges.
Thankfully, Robot Operating System (ROS) provides a communication platform and
a vast variety of tools and utilities that can aid that development. However,
it is hard to use ROS in large-scale automation systems due to communication
issues in a distributed setup, hence the development of ROS2. In this paper, a
ROS2 based communication architecture is presented together with an industrial
use-case of a collaborative and intelligent automation system.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Eros_E/0/1/0/all/0/1">Endre Er&#x151;s</a>, <a href="http://arxiv.org/find/cs/1/au:+Dahl_M/0/1/0/all/0/1">Martin Dahl</a>, <a href="http://arxiv.org/find/cs/1/au:+Bengtsson_K/0/1/0/all/0/1">Kristofer Bengtsson</a>, <a href="http://arxiv.org/find/cs/1/au:+Hanna_A/0/1/0/all/0/1">Atieh Hanna</a>, <a href="http://arxiv.org/find/cs/1/au:+Falkman_P/0/1/0/all/0/1">Petter Falkman</a>StrongChain: Transparent and Collaborative Proof-of-Work Consensus. (arXiv:1905.09655v1 [cs.CR])http://arxiv.org/abs/1905.09655
<p>Bitcoin is the most successful cryptocurrency so far. This is mainly due to
its novel consensus algorithm, which is based on proof-of-work combined with a
cryptographically-protected data structure and a rewarding scheme that
incentivizes nodes to participate. However, despite its unprecedented success
Bitcoin suffers from many inefficiencies. For instance, Bitcoin's consensus
mechanism has been proved to be incentive-incompatible, its high reward
variance causes centralization, and its hardcoded deflation raises questions
about its long-term sustainability.
</p>
<p>In this work, we revise the Bitcoin consensus mechanism by proposing
StrongChain, a scheme that introduces transparency and incentivizes
participants to collaborate rather than to compete. The core design of our
protocol is to reflect and utilize the computing power aggregated on the
blockchain which is invisible and "wasted" in Bitcoin today. Introducing
relatively easy, although important changes to Bitcoin's design enables us to
improve many crucial aspects of Bitcoin-like cryptocurrencies making it more
secure, efficient, and profitable for participants. We thoroughly analyze our
approach and we present an implementation of StrongChain. The obtained results
confirm its efficiency, security, and deployability.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Szalachowski_P/0/1/0/all/0/1">Pawel Szalachowski</a>, <a href="http://arxiv.org/find/cs/1/au:+Reijsbergen_D/0/1/0/all/0/1">Daniel Reijsbergen</a>, <a href="http://arxiv.org/find/cs/1/au:+Homoliak_I/0/1/0/all/0/1">Ivan Homoliak</a>, <a href="http://arxiv.org/find/cs/1/au:+Sun_S/0/1/0/all/0/1">Siwei Sun</a>On the Average Case of MergeInsertion. (arXiv:1905.09656v1 [cs.DS])http://arxiv.org/abs/1905.09656
<p>MergeInsertion, also known as the Ford-Johnson algorithm, is a sorting
algorithm which, up to today, for many input sizes achieves the best known
upper bound on the number of comparisons. Indeed, it gets extremely close to
the information-theoretic lower bound. While the worst-case behavior is well
understood, only little is known about the average case.
</p>
<p>This work takes a closer look at the average case behavior. In particular, we
establish an upper bound of $n \log n - 1.4005n + o(n)$ comparisons. We also
give an exact description of the probability distribution of the length of the
chain a given element is inserted into and use it to approximate the average
number of comparisons numerically. Moreover, we compute the exact average
number of comparisons for $n$ up to 148.
</p>
<p>Furthermore, we experimentally explore the impact of different decision trees
for binary insertion. To conclude, we conduct experiments showing that a
slightly different insertion order leads to a better average case and we
compare the algorithm to the recent combination with (1,2)-Insertionsort by
Iwama and Teruyama.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Stober_F/0/1/0/all/0/1">Florian Stober</a>, <a href="http://arxiv.org/find/cs/1/au:+Weiss_A/0/1/0/all/0/1">Armin Wei&#xdf;</a>Towards Artificial Learning Companions for Mental Imagery-based Brain-Computer Interfaces. (arXiv:1905.09658v1 [cs.HC])http://arxiv.org/abs/1905.09658
<p>Mental Imagery based Brain-Computer Interfaces (MI-BCI) enable their users to
control an interface, e.g., a prosthesis, by performing mental imagery tasks
only, such as imagining a right arm movement while their brain activity is
measured and processed by the system. Designing and using a BCI requires users
to learn how to produce different and stable patterns of brain activity for
each of the mental imagery tasks. However, current training protocols do not
enable every user to acquire the skills required to use BCIs. These training
protocols are most likely one of the main reasons why BCIs remain not reliable
enough for wider applications outside research laboratories. Learning
companions have been shown to improve training in different disciplines, but
they have barely been explored for BCIs so far. This article aims at
investigating the potential benefits learning companions could bring to BCI
training by improving the feedback, i.e., the information provided to the user,
which is primordial to the learning process and yet have proven both
theoretically and practically inadequate in BCI. This paper first presents the
potentials of BCI and the limitations of current training approaches. Then, it
reviews both the BCI and learning companion literature regarding three main
characteristics of feedback: its appearance, its social and emotional
components and its cognitive component. From these considerations, this paper
draws some guidelines, identify open challenges and suggests potential
solutions to design and use learning companions for BCIs.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pillette_L/0/1/0/all/0/1">L&#xe9;a Pillette</a> (Potioc, LaBRI, Bordeaux INP), <a href="http://arxiv.org/find/cs/1/au:+Jeunet_C/0/1/0/all/0/1">Camille Jeunet</a> (CNBI, Hybrid), <a href="http://arxiv.org/find/cs/1/au:+NKambou_R/0/1/0/all/0/1">Roger N&#x27;Kambou</a> (Laboratoire GDAC), <a href="http://arxiv.org/find/cs/1/au:+NKaoua_B/0/1/0/all/0/1">Bernard N&#x27;Kaoua</a> (PHOENIX-POST, HACS, CNRS), <a href="http://arxiv.org/find/cs/1/au:+Lotte_F/0/1/0/all/0/1">Fabien Lotte</a> (Potioc, LaBRI, Bordeaux INP)Data-Driven Crowd Simulation with Generative Adversarial Networks. (arXiv:1905.09661v1 [cs.GR])http://arxiv.org/abs/1905.09661
<p>This paper presents a novel data-driven crowd simulation method that can
mimic the observed traffic of pedestrians in a given environment. Given a set
of observed trajectories, we use a recent form of neural networks, Generative
Adversarial Networks (GANs), to learn the properties of this set and generate
new trajectories with similar properties. We define a way for simulated
pedestrians (agents) to follow such a trajectory while handling local collision
avoidance. As such, the system can generate a crowd that behaves similarly to
observations, while still enabling real-time interactions between agents. Via
experiments with real-world data, we show that our simulated trajectories
preserve the statistical properties of their input. Our method simulates crowds
in real time that resemble existing crowds, while also allowing insertion of
extra agents, combination with other simulation methods, and user interaction.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Amirian_J/0/1/0/all/0/1">Javad Amirian</a>, <a href="http://arxiv.org/find/cs/1/au:+Toll_W/0/1/0/all/0/1">Wouter van Toll</a>, <a href="http://arxiv.org/find/cs/1/au:+Hayet_J/0/1/0/all/0/1">Jean-Bernard Hayet</a>, <a href="http://arxiv.org/find/cs/1/au:+Pettre_J/0/1/0/all/0/1">Julien Pettr&#xe9;</a>Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies. (arXiv:1905.09668v1 [cs.RO])http://arxiv.org/abs/1905.09668
<p>A common strategy to deal with the expensive reinforcement learning (RL) of
complex tasks is to decompose them into a collection of subtasks that are
usually simpler to learn as well as reusable for new problems. However, when a
robot learns the policies for these subtasks, common approaches treat every
policy learning process separately. Therefore, all these individual
(composable) policies need to be learned before tackling the learning process
of the complex task through policies composition. Such composition of
individual policies is usually performed sequentially, which is not suitable
for tasks that require to perform the subtasks concurrently. In this paper, we
propose to combine a set of composable Gaussian policies corresponding to these
subtasks using a set of activation vectors, resulting in a complex Gaussian
policy that is a function of the means and covariances matrices of the
composable policies. Moreover, we propose an algorithm for learning both
compound and composable policies within the same learning process by exploiting
the off-policy data generated from the compound policy. The algorithm is built
on a maximum entropy RL approach to favor exploration during the learning
process. The results of the experiments show that the experience collected with
the compound policy permits not only to solve the complex task but also to
obtain useful composable policies that successfully perform in their respective
tasks. Supplementary videos and code are available at
https://sites.google.com/view/hrl-concurrent-discovery .
</p>
<a href="http://arxiv.org/find/cs/1/au:+Esteban_D/0/1/0/all/0/1">Domingo Esteban</a>, <a href="http://arxiv.org/find/cs/1/au:+Rozo_L/0/1/0/all/0/1">Leonel Rozo</a>, <a href="http://arxiv.org/find/cs/1/au:+Caldwell_D/0/1/0/all/0/1">Darwin G. Caldwell</a>Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation. (arXiv:1905.09670v1 [stat.ML])http://arxiv.org/abs/1905.09670
<p>We propose a new scalable multi-class Gaussian process classification
approach building on a novel modified softmax likelihood function. The new
likelihood has two benefits: it leads to well-calibrated uncertainty estimates
and allows for an efficient latent variable augmentation. The augmented model
has the advantage that it is conditionally conjugate leading to a fast
variational inference method via block coordinate ascent updates. Previous
approaches suffered from a trade-off between uncertainty calibration and speed.
Our experiments show that our method leads to well-calibrated uncertainty
estimates and competitive predictive performance while being up to two orders
faster than the state of the art.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Galy_Fajou_T/0/1/0/all/0/1">Th&#xe9;o Galy-Fajou</a>, <a href="http://arxiv.org/find/stat/1/au:+Wenzel_F/0/1/0/all/0/1">Florian Wenzel</a>, <a href="http://arxiv.org/find/stat/1/au:+Donner_C/0/1/0/all/0/1">Christian Donner</a>, <a href="http://arxiv.org/find/stat/1/au:+Opper_M/0/1/0/all/0/1">Manfred Opper</a>Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment. (arXiv:1905.09673v1 [cs.AI])http://arxiv.org/abs/1905.09673
<p>We focus on the important problem of emergency evacuation, which clearly
could benefit from reinforcement learning that has been largely unaddressed.
Emergency evacuation is a complex task which is difficult to solve with
reinforcement learning, since an emergency situation is highly dynamic, with a
lot of changing variables and complex constraints that makes it difficult to
train on. In this paper, we propose the first fire evacuation environment to
train reinforcement learning agents for evacuation planning. The environment is
modelled as a graph capturing the building structure. It consists of realistic
features like fire spread, uncertainty and bottlenecks. We have implemented the
environment in the OpenAI gym format, to facilitate future research. We also
propose a new reinforcement learning approach that entails pretraining the
network weights of a DQN based agents to incorporate information on the
shortest path to the exit. We achieved this by using tabular Q-learning to
learn the shortest path on the building model's graph. This information is
transferred to the network by deliberately overfitting it on the Q-matrix.
Then, the pretrained DQN model is trained on the fire evacuation environment to
generate the optimal evacuation path under time varying conditions. We perform
comparisons of the proposed approach with state-of-the-art reinforcement
learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that
our method is able to outperform state-of-the-art models by a huge margin
including the original DQN based models. Finally, we test our model on a large
and complex real building consisting of 91 rooms, with the possibility to move
to any other room, hence giving 8281 actions. We use an attention based
mechanism to deal with large action spaces. Our model achieves near optimal
performance on the real world emergency environment.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Sharma_J/0/1/0/all/0/1">Jivitesh Sharma</a>, <a href="http://arxiv.org/find/cs/1/au:+Andersen_P/0/1/0/all/0/1">Per-Arne Andersen</a>, <a href="http://arxiv.org/find/cs/1/au:+Granmo_O/0/1/0/all/0/1">Ole-Chrisoffer Granmo</a>, <a href="http://arxiv.org/find/cs/1/au:+Goodwin_M/0/1/0/all/0/1">Morten Goodwin</a>Disentangling Redundancy for Multi-Task Pruning. (arXiv:1905.09676v1 [cs.LG])http://arxiv.org/abs/1905.09676
<p>Can prior network pruning strategies eliminate redundancy in multiple
correlated pre-trained deep neural networks? It seems a positive answer if
multiple networks are first combined and then pruned. However, we argue that an
arbitrarily combined network may lead to sub-optimal pruning performance
because their intra- and inter-redundancy may not be minimised at the same time
while retaining the inference accuracy in each task. In this paper, we define
and analyse the redundancy in multi-task networks from an information theoretic
perspective, and identify challenges for existing pruning methods to function
effectively for multi-task pruning. We propose Redundancy-Disentangled Networks
(RDNets), which decouples intra- and inter-redundancy such that all redundancy
can be suppressed via previous network pruning schemes. A pruned RDNet also
ensures minimal computation in any subset of tasks, a desirable feature for
selective task execution. Moreover, a heuristic is devised to construct an
RDNet from multiple pre-trained networks. Experiments on CelebA show that the
same pruning method on an RDNet achieves at least 1:8x lower memory usage and
1:4x lower computation cost than on a multi-task network constructed by the
state-of-the-art network merging scheme.
</p>
<a href="http://arxiv.org/find/cs/1/au:+He_X/0/1/0/all/0/1">Xiaoxi He</a>, <a href="http://arxiv.org/find/cs/1/au:+Gao_D/0/1/0/all/0/1">Dawei Gao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhou_Z/0/1/0/all/0/1">Zimu Zhou</a>, <a href="http://arxiv.org/find/cs/1/au:+Tong_Y/0/1/0/all/0/1">Yongxin Tong</a>, <a href="http://arxiv.org/find/cs/1/au:+Thiele_L/0/1/0/all/0/1">Lothar Thiele</a>Some limitations of norm based generalization bounds in deep neural networks. (arXiv:1905.09677v1 [cs.LG])http://arxiv.org/abs/1905.09677
<p>Deep convolutional neural networks have been shown to be able to fit a
labeling over random data while still being able to generalize well on normal
datasets. Describing deep convolutional neural network capacity through the
measure of spectral complexity has been recently proposed to tackle this
apparent paradox. Spectral complexity correlates with GE and can distinguish
networks trained on normal and random labels. We propose the first GE bound
based on spectral complexity for deep convolutional neural networks and provide
tighter bounds by orders of magnitude from the previous estimate. We then
investigate theoretically and empirically the insensitivity of spectral
complexity to invariances of modern deep convolutional neural networks, and
show several limitations of spectral complexity that occur as a result.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pitas_K/0/1/0/all/0/1">Konstantinos Pitas</a>, <a href="http://arxiv.org/find/cs/1/au:+Loukas_A/0/1/0/all/0/1">Andreas Loukas</a>, <a href="http://arxiv.org/find/cs/1/au:+Davies_M/0/1/0/all/0/1">Mike Davies</a>, <a href="http://arxiv.org/find/cs/1/au:+Vandergheynst_P/0/1/0/all/0/1">Pierre Vandergheynst</a>Nullspace Structure in Model Predictive Control. (arXiv:1905.09679v1 [cs.RO])http://arxiv.org/abs/1905.09679
<p>Robotic tasks can be accomplished by exploiting different forms of
redundancies. This work focuses on planning redundancy within Model Predictive
Control (MPC) in which several paths can be considered within the MPC time
horizon. We present the nullspace structure in MPC with a quadratic
approximation of the cost and a linearization of the dynamics. We exploit the
low rank structure of the precision matrices used in MPC (encapsulating
spatiotemporal information) to perform hierarchical task planning, and show how
nullspace computation can be treated as a fusion problem (computed with a
product of Gaussian experts). We illustrate the approach using proof-of-concept
examples with point mass objects and simulated robotics applications.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Girgin_H/0/1/0/all/0/1">Hakan Girgin</a>, <a href="http://arxiv.org/find/cs/1/au:+Calinon_S/0/1/0/all/0/1">Sylvain Calinon</a>DEEP-BO for Hyperparameter Optimization of Deep Networks. (arXiv:1905.09680v1 [cs.LG])http://arxiv.org/abs/1905.09680
<p>The performance of deep neural networks (DNN) is very sensitive to the
particular choice of hyper-parameters. To make it worse, the shape of the
learning curve can be significantly affected when a technique like batchnorm is
used. As a result, hyperparameter optimization of deep networks can be much
more challenging than traditional machine learning models. In this work, we
start from well known Bayesian Optimization solutions and provide enhancement
strategies specifically designed for hyperparameter optimization of deep
networks. The resulting algorithm is named as DEEP-BO (Diversified,
Early-termination-Enabled, and Parallel Bayesian Optimization). When evaluated
over six DNN benchmarks, DEEP-BO easily outperforms or shows comparable
performance with some of the well-known solutions including GP-Hedge,
Hyperband, BOHB, Median Stopping Rule, and Learning Curve Extrapolation. The
code used is made publicly available at https://github.com/snu-adsl/DEEP-BO.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cho_H/0/1/0/all/0/1">Hyunghun Cho</a>, <a href="http://arxiv.org/find/cs/1/au:+Kim_Y/0/1/0/all/0/1">Yongjin Kim</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_E/0/1/0/all/0/1">Eunjung Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+Choi_D/0/1/0/all/0/1">Daeyoung Choi</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_Y/0/1/0/all/0/1">Yongjae Lee</a>, <a href="http://arxiv.org/find/cs/1/au:+Rhee_W/0/1/0/all/0/1">Wonjong Rhee</a>From semantics to execution: Integrating action planning with reinforcement learning for robotic tool use. (arXiv:1905.09683v1 [cs.LG])http://arxiv.org/abs/1905.09683
<p>Reinforcement learning is an appropriate and successful method to robustly
perform low-level robot control under noisy conditions. Symbolic action
planning is useful to resolve causal dependencies and to break a causally
complex problem down into a sequence of simpler high-level actions. A problem
with the integration of both approaches is that action planning is based on
discrete high-level action- and state spaces, whereas reinforcement learning is
usually driven by a continuous reward function. However, recent advances in
reinforcement learning, specifically, universal value function approximators
and hindsight experience replay, have focused on goal-independent methods based
on sparse rewards. In this article, we build on these novel methods to
facilitate the integration of action planning with reinforcement learning by
exploiting the reward-sparsity as a bridge between the high-level and low-level
state- and control spaces. As a result, we demonstrate that the integrated
neuro-symbolic method is able to solve object manipulation problems that
involve tool use and non-trivial causal dependencies under noisy conditions,
exploiting both data and knowledge.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Eppe_M/0/1/0/all/0/1">Manfred Eppe</a>, <a href="http://arxiv.org/find/cs/1/au:+Nguyen_P/0/1/0/all/0/1">Phuong D.H. Nguyen</a>, <a href="http://arxiv.org/find/cs/1/au:+Wermter_S/0/1/0/all/0/1">Stefan Wermter</a>Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data. (arXiv:1905.09684v1 [cs.LG])http://arxiv.org/abs/1905.09684
<p>This work addresses a new problem of learning generative adversarial networks
(GANs) from multiple data collections that are each i) owned separately and
privately by different clients and ii) drawn from a non-identical distribution
that comprises different classes. Given such multi-client and non-iid data as
input, we aim to achieve a distribution involving all the classes input data
can belong to, while keeping the data decentralized and private in each client
storage. Our key contribution to this end is a new decentralized approach for
learning GANs from non-iid data called Forgiver-First Update (F2U), which a)
asks clients to train an individual discriminator with their own data and b)
updates a generator to fool the most `forgiving' discriminators who deem
generated samples as the most real. Our theoretical analysis proves that this
updating strategy indeed allows the decentralized GAN to learn a generator's
distribution with all the input classes as its global optimum based on
f-divergence minimization. Moreover, we propose a relaxed version of F2U called
Forgiver-First Aggregation (F2A), which adaptively aggregates the
discriminators while emphasizing forgiving ones to perform well in practice.
Our empirical evaluations with image generation tasks demonstrated the
effectiveness of our approach over state-of-the-art decentralized learning
methods.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Yonetani_R/0/1/0/all/0/1">Ryo Yonetani</a>, <a href="http://arxiv.org/find/cs/1/au:+Takahashi_T/0/1/0/all/0/1">Tomohiro Takahashi</a>, <a href="http://arxiv.org/find/cs/1/au:+Hashimoto_A/0/1/0/all/0/1">Atsushi Hashimoto</a>, <a href="http://arxiv.org/find/cs/1/au:+Ushiku_Y/0/1/0/all/0/1">Yoshitaka Ushiku</a>The Convolutional Tsetlin Machine. (arXiv:1905.09688v1 [cs.LG])http://arxiv.org/abs/1905.09688
<p>Deep neural networks have obtained astounding successes for important pattern
recognition tasks, but they suffer from high computational complexity and the
lack of interpretability. The recent Tsetlin Machine (TM) attempts to address
this lack by using easy-to-interpret conjunctive clauses in propositional logic
to solve complex pattern recognition problems. The TM provides competitive
accuracy in several benchmarks, while keeping the important property of
interpretability. It further facilitates hardware-near implementation since
inputs, patterns, and outputs are expressed as bits, while recognition and
learning rely on straightforward bit manipulation. In this paper, we exploit
the TM paradigm by introducing the Convolutional Tsetlin Machine (CTM), as an
interpretable alternative to convolutional neural networks (CNNs). Whereas the
TM categorizes an image by employing each clause once to the whole image, the
CTM uses each clause as a convolution filter. That is, a clause is evaluated
multiple times, once per image patch taking part in the convolution. To make
the clauses location-aware, each patch is further augmented with its
coordinates within the image. The output of a convolution clause is obtained
simply by ORing the outcome of evaluating the clause on each patch. In the
learning phase of the TM, clauses that evaluate to 1 are contrasted against the
input. For the CTM, we instead contrast against one of the patches, randomly
selected among the patches that made the clause evaluate to 1. Accordingly, the
standard Type I and Type II feedback of the classic TM can be employed
directly, without further modification. The CTM obtains a peak test accuracy of
99.51% on MNIST, 96.21% on Kuzushiji-MNIST, 89.56% on Fashion-MNIST, and 100.0%
on the 2D Noisy XOR Problem, which is competitive with results reported for
simple 4-layer CNNs, BinaryConnect, and a recent FPGA-accelerated Binary CNN.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Granmo_O/0/1/0/all/0/1">Ole-Christoffer Granmo</a>, <a href="http://arxiv.org/find/cs/1/au:+Glimsdal_S/0/1/0/all/0/1">Sondre Glimsdal</a>, <a href="http://arxiv.org/find/cs/1/au:+Jiao_L/0/1/0/all/0/1">Lei Jiao</a>, <a href="http://arxiv.org/find/cs/1/au:+Goodwin_M/0/1/0/all/0/1">Morten Goodwin</a>, <a href="http://arxiv.org/find/cs/1/au:+Omlin_C/0/1/0/all/0/1">Christian W. Omlin</a>, <a href="http://arxiv.org/find/cs/1/au:+Berge_G/0/1/0/all/0/1">Geir Thore Berge</a>Fully Neural Network based Model for General Temporal Point Processes. (arXiv:1905.09690v1 [cs.LG])http://arxiv.org/abs/1905.09690
<p>A temporal point process is a mathematical model for a time series of
discrete events, which covers various applications. Recently, recurrent neural
network (RNN) based models have been developed for point processes and have
been found effective. RNN based models usually assume a specific functional
form for the time course of the intensity function of a point process (e.g.,
exponentially decreasing or increasing with the time since the most recent
event). However, such an assumption can restrict the expressive power of the
model. We herein propose a novel RNN based model in which the time course of
the intensity function is represented in a general manner. In our approach, we
first model the integral of the intensity function using a feedforward neural
network and then obtain the intensity function as its derivative. This approach
enables us to both obtain a flexible model of the intensity function and
exactly evaluate the log-likelihood function, which contains the integral of
the intensity function, without any numerical approximations. Our model
achieves competitive or superior performances compared to the previous
state-of-the-art methods for both synthetic and real datasets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Omi_T/0/1/0/all/0/1">Takahiro Omi</a>, <a href="http://arxiv.org/find/cs/1/au:+Ueda_N/0/1/0/all/0/1">Naonori Ueda</a>, <a href="http://arxiv.org/find/cs/1/au:+Aihara_K/0/1/0/all/0/1">Kazuyuki Aihara</a>Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs. (arXiv:1905.09691v1 [stat.ML])http://arxiv.org/abs/1905.09691
<p>Despite recent innovations in network architectures and loss functions,
training RNNs to learn long-term dependencies remains difficult due to
challenges with gradient-based optimisation methods. Inspired by the success of
Deep Neuroevolution in reinforcement learning (Such et al. 2017), we explore
the use of gradient-free population-based global optimisation (PBO) techniques
-- training RNNs to capture long-term dependencies in time-series data. Testing
evolution strategies (ES) and particle swarm optimisation (PSO) on an
application in volatility forecasting, we demonstrate that PBO methods lead to
performance improvements in general, with ES exhibiting the most consistent
results across a variety of architectures.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Lim_B/0/1/0/all/0/1">Bryan Lim</a>, <a href="http://arxiv.org/find/stat/1/au:+Zohren_S/0/1/0/all/0/1">Stefan Zohren</a>, <a href="http://arxiv.org/find/stat/1/au:+Roberts_S/0/1/0/all/0/1">Stephen Roberts</a>Fusion of heterogeneous bands and kernels in hyperspectral image processing. (arXiv:1905.09698v1 [eess.IV])http://arxiv.org/abs/1905.09698
<p>Hyperspectral imaging is a powerful technology that is plagued by large
dimensionality. Herein, we explore a way to combat that hindrance via
non-contiguous and contiguous (simpler to realize sensor) band grouping for
dimensionality reduction. Our approach is different in the respect that it is
flexible and it follows a well-studied process of visual clustering in
high-dimensional spaces. Specifically, we extend the improved visual assessment
of cluster tendency and clustering in ordered dissimilarity data unsupervised
clustering algorithms for supervised hyperspectral learning. In addition, we
propose a way to extract diverse features via the use of different proximity
metrics (ways to measure the similarity between bands) and kernel functions.
The discovered features are fused with $l_{\infty}$-norm multiple kernel
learning. Experiments are conducted on two benchmark datasets and our results
are compared to related work. These datasets indicate that contiguous or not is
application specific, but heterogeneous features and kernels usually lead to
performance gain.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Islam_M/0/1/0/all/0/1">Muhammad Aminul Islam</a>, <a href="http://arxiv.org/find/eess/1/au:+Anderson_D/0/1/0/all/0/1">Derek T. Anderson</a>, <a href="http://arxiv.org/find/eess/1/au:+Ball_J/0/1/0/all/0/1">John E. Ball</a>, <a href="http://arxiv.org/find/eess/1/au:+Younan_N/0/1/0/all/0/1">Nicolas H. Younan</a>Action Assembly: Sparse Imitation Learning for Text Based Games with Combinatorial Action Spaces. (arXiv:1905.09700v1 [cs.LG])http://arxiv.org/abs/1905.09700
<p>We propose a computationally efficient algorithm that combines compressed
sensing with imitation learning to solve sequential decision making text-based
games with combinatorial action spaces. We propose a variation of the
compressed sensing algorithm Orthogonal Matching Pursuit (OMP), that we call
IK-OMP, and show that it can recover a bag-of-words from a sum of the
individual word embeddings, even in the presence of noise. We incorporate
IK-OMP into a supervised imitation learning setting and show that this
algorithm, called Sparse Imitation Learning (Sparse-IL), solves the entire
text-based game of Zork1 with an action space of approximately 10 million
actions using imperfect, noisy demonstrations.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Tessler_C/0/1/0/all/0/1">Chen Tessler</a>, <a href="http://arxiv.org/find/cs/1/au:+Zahavy_T/0/1/0/all/0/1">Tom Zahavy</a>, <a href="http://arxiv.org/find/cs/1/au:+Cohen_D/0/1/0/all/0/1">Deborah Cohen</a>, <a href="http://arxiv.org/find/cs/1/au:+Mankowitz_D/0/1/0/all/0/1">Daniel J. Mankowitz</a>, <a href="http://arxiv.org/find/cs/1/au:+Mannor_S/0/1/0/all/0/1">Shie Mannor</a>Reconfigurable radiofrequency electronic functions designed with 3D Smith Charts in Metal-Insulator-Transition Materials. (arXiv:1905.09701v1 [physics.app-ph])http://arxiv.org/abs/1905.09701
<p>Recently, the field of Metal-Insulator-Transition (MIT) materials has emerged
as an unconventional solution for novel energy efficient electronic functions,
such as steep slope subthermionic switches, neuromorphic hardware,
reconfigurable radiofrequency functions, new types of sensors, teraherz and
optoelectronic devices. Designing radiofrequency (RF) electronic circuits with
a MIT material like vanadium dioxide, VO2, requires the understanding of its
physics and appropriate models and tools, with predictive capability over large
range of frequency (1-100GHz). Here, we develop 3D Smith charts for devices and
circuits having complex frequency dependences, like the ones resulting by the
use of MIT materials. The novel foundation of a 3D Smith chart involves here
the geometrical fundamental notions of oriented curvature and variable
homothety in order to clarify first theoretical inconsistencies in Foster and
Non Foster circuits, where the driving point impedances exhibit mixed clockwise
and counter-clockwise frequency dependent paths on the Smith chart as frequency
increases. We show here the unique visualization capability of a 3D Smith
chart, which allows to quantify orientation over variable frequency. The new 3D
Smith chart is applied as a 3D multi-parameter modelling and design environment
for the complex case of Metal-Insulator-Transition (MIT) materials where their
permittivity is dependent on the frequency. In this work, we apply 3D Smith
charts to on Vanadium Dioxide (VO2) reconfigurable Peano inductors. We report
fabricated inductors with record quality factors using VO2 phase transition to
program multiple tuning states, operating in the range 4 GHz to 10 GHz.
Finally, we fabricate new Peano curves filters used to extract the
frequency-dependent dielectric constant of VO2 within 1 GHz-50 GHz for the
accurate design of RF electronic applications with phase change materials
</p>
<a href="http://arxiv.org/find/physics/1/au:+Muller_A/0/1/0/all/0/1">Andrei Muller</a>, <a href="http://arxiv.org/find/physics/1/au:+Moldoveanu_A/0/1/0/all/0/1">Alin Moldoveanu</a>, <a href="http://arxiv.org/find/physics/1/au:+Asavei_V/0/1/0/all/0/1">Victor Asavei</a>, <a href="http://arxiv.org/find/physics/1/au:+Khadar_R/0/1/0/all/0/1">Riyaz Khadar</a>, <a href="http://arxiv.org/find/physics/1/au:+Codesal_E/0/1/0/all/0/1">Esther Sanabria Codesal</a>, <a href="http://arxiv.org/find/physics/1/au:+Krammer_A/0/1/0/all/0/1">Anna Krammer</a>, <a href="http://arxiv.org/find/physics/1/au:+Fernandez_Bolanos_M/0/1/0/all/0/1">Montserrat Fernandez-Bola&#xf1;os</a>, <a href="http://arxiv.org/find/physics/1/au:+Cavalleri_M/0/1/0/all/0/1">Matteo Cavalleri</a>, <a href="http://arxiv.org/find/physics/1/au:+Zhang_J/0/1/0/all/0/1">Junrui Zhang</a>, <a href="http://arxiv.org/find/physics/1/au:+Casu_E/0/1/0/all/0/1">Emanuele Casu</a>, <a href="http://arxiv.org/find/physics/1/au:+Schuler_A/0/1/0/all/0/1">Andreas Schuler</a>, <a href="http://arxiv.org/find/physics/1/au:+Ionescu_A/0/1/0/all/0/1">Adrian Mihai Ionescu</a>Average reward reinforcement learning with unknown mixing times. (arXiv:1905.09704v1 [cs.LG])http://arxiv.org/abs/1905.09704
<p>We derive and analyze learning algorithms for policy evaluation,
apprenticeship learning, and policy gradient for average reward criteria.
Existing algorithms explicitly require an upper bound on the mixing time. In
contrast, we build on ideas from Markov chain theory and derive sampling
algorithms that do not require such an upper bound. For these algorithms, we
provide theoretical bounds on their sample-complexity and running time.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zahavy_T/0/1/0/all/0/1">Tom Zahavy</a>, <a href="http://arxiv.org/find/cs/1/au:+Cohen_A/0/1/0/all/0/1">Alon Cohen</a>, <a href="http://arxiv.org/find/cs/1/au:+Kaplan_H/0/1/0/all/0/1">Haim Kaplan</a>, <a href="http://arxiv.org/find/cs/1/au:+Mansour_Y/0/1/0/all/0/1">Yishay Mansour</a>Watermark retrieval from 3D printed objects via synthetic data training. (arXiv:1905.09706v1 [cs.CV])http://arxiv.org/abs/1905.09706
<p>We present a deep neural network based method for the retrieval of watermarks
from images of 3D printed objects. To deal with the variability of all possible
3D printing and image acquisition settings we train the network with synthetic
data. The main simulator parameters such as texture, illumination and camera
position are dynamically randomized in non-realistic ways, forcing the neural
network to learn the intrinsic features of the 3D printed watermarks. At the
end of the pipeline, the watermark, in the form of a two-dimensional bit array,
is retrieved through a series of simple image processing and statistical
operations applied on the confidence map generated by the neural network. The
results demonstrate that the inclusion of synthetic DR data in the training set
increases the generalization power of the network, which performs better on
images from previously unseen 3D printed objects. We conclude that in our
application domain of information retrieval from 3D printed objects, where
access to the exact CAD files of the printed objects can be assumed, one can
use inexpensive synthetic data to enhance neural network training, reducing the
need for the labour intensive process of creating large amounts of hand
labelled real data or the need to generate photorealistic synthetic data.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_X/0/1/0/all/0/1">Xin Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Jia_N/0/1/0/all/0/1">Ning Jia</a>, <a href="http://arxiv.org/find/cs/1/au:+Ivrissimtzis_I/0/1/0/all/0/1">Ioannis Ivrissimtzis</a>Inverse Reinforcement Learning in Contextual MDPs. (arXiv:1905.09710v1 [cs.LG])http://arxiv.org/abs/1905.09710
<p>We consider the Inverse Reinforcement Learning (IRL) problem in Contextual
Markov Decision Processes (CMDPs). Here, the reward of the environment, which
is not available to the agent, depends on a static parameter referred to as the
context. Each context defines an MDP (with a different reward signal), and the
agent is provided demonstrations by an expert, for different contexts. The goal
is to learn a mapping from contexts to rewards, such that planning with respect
to the induced reward will perform similarly to the expert, even for unseen
contexts. We suggest two learning algorithms for this scenario. (1) For rewards
that are a linear function of the context, we provide a method that is
guaranteed to return an $\epsilon$-optimal solution after a polynomial number
of demonstrations. (2) For general reward functions, we propose black-box
descent methods based on evolutionary strategies capable of working with
nonlinear estimators (e.g., neural networks). We evaluate our algorithms in
autonomous driving and medical treatment simulations and demonstrate their
ability to learn and generalize to unseen contexts.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Korsunsky_P/0/1/0/all/0/1">Philip Korsunsky</a>, <a href="http://arxiv.org/find/cs/1/au:+Belo_S/0/1/0/all/0/1">Stav Belo</a>, <a href="http://arxiv.org/find/cs/1/au:+Zahavy_T/0/1/0/all/0/1">Tom Zahavy</a>, <a href="http://arxiv.org/find/cs/1/au:+Tessler_C/0/1/0/all/0/1">Chen Tessler</a>, <a href="http://arxiv.org/find/cs/1/au:+Mannor_S/0/1/0/all/0/1">Shie Mannor</a>Accelerating DNN Training in Wireless Federated Edge Learning System. (arXiv:1905.09712v1 [cs.LG])http://arxiv.org/abs/1905.09712
<p>Training task in classical machine learning models, such as deep neural
networks (DNN), is generally implemented at the remote computationally-adequate
cloud center for centralized learning, which is typically time-consuming and
resource-hungry. It also incurs serious privacy issue and long communication
latency since massive data are transmitted to the centralized node. To overcome
these shortcomings, we consider a newly-emerged framework, namely federated
edge learning (FEEL), to aggregate the local learning updates at the edge
server instead of users' raw data. Aiming at accelerating the training process
while guaranteeing the learning accuracy, we first define a novel performance
evaluation criterion, called learning efficiency and formulate a training
acceleration optimization problem in the CPU scenario, where each user device
is equipped with CPU. The closed-form expressions for joint batchsize selection
and communication resource allocation are developed and some insightful results
are also highlighted. Further, we extend our learning framework into the GPU
scenario and propose a novel training function to characterize the learning
property of general GPU modules. The optimal solution in this case is
manifested to have the similar structure as that of the CPU scenario,
recommending that our proposed algorithm is applicable in more general systems.
Finally, extensive experiments validate our theoretical analysis and
demonstrate that our proposal can reduce the training time and improve the
learning accuracy simultaneously.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ren_J/0/1/0/all/0/1">Jinke Ren</a>, <a href="http://arxiv.org/find/cs/1/au:+Yu_G/0/1/0/all/0/1">Guanding Yu</a>, <a href="http://arxiv.org/find/cs/1/au:+Ding_G/0/1/0/all/0/1">Guangyao Ding</a>A Convolutional Cost-Sensitive Crack Localization Algorithm for Automated and Reliable RC Bridge Inspection. (arXiv:1905.09716v1 [cs.CV])http://arxiv.org/abs/1905.09716
<p>Bridges are an essential part of the transportation infrastructure and need
to be monitored periodically. Visual inspections by dedicated teams have been
one of the primary tools in structural health monitoring (SHM) of bridge
structures. However, such conventional methods have certain shortcomings.
Manual inspections may be challenging in harsh environments and are commonly
biased in nature. In the last decade, camera-equipped unmanned aerial vehicles
(UAVs) have been widely used for visual inspections; however, the task of
automatically extracting useful information from raw images is still
challenging. In this paper, a deep learning semantic segmentation framework is
proposed to automatically localize surface cracks. Due to the high imbalance of
crack and background classes in images, different strategies are investigated
to improve performance and reliability. The trained models are tested on
real-world crack images showing impressive robustness in terms of the metrics
defined by the concepts of precision and recall. These techniques can be used
in SHM of bridges to extract useful information from the unprocessed images
taken from UAVs.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Sajedi_S/0/1/0/all/0/1">Seyed Omid Sajedi</a>, <a href="http://arxiv.org/find/cs/1/au:+Liang_X/0/1/0/all/0/1">Xiao Liang</a>Network Pruning via Transformable Architecture Search. (arXiv:1905.09717v1 [cs.CV])http://arxiv.org/abs/1905.09717
<p>Network pruning reduces the computation costs of an over-parameterized
network without performance damage. Prevailing pruning algorithms pre-define
the width and depth of the pruned networks, and then transfer parameters from
the unpruned network to pruned networks. To break the structure limitation of
the pruned networks, we propose to apply neural architecture search to search
directly for a network with flexible channel and layer sizes. The number of the
channels/layers is learned by minimizing the loss of the pruned networks. The
feature map of the pruned network is an aggregation of K feature map fragments
(generated by K networks of different sizes), which are sampled based on the
probability distribution.The loss can be back-propagated not only to the
network weights, but also to the parameterized distribution to explicitly tune
the size of the channels/layers. Specifically, we apply channel-wise
interpolation to keep the feature map with different channel sizes aligned in
the aggregation procedure. The maximum probability for the size in each
distribution serves as the width and depth of the pruned network, whose
parameters are learned by knowledge transfer, e.g., knowledge distillation,
from the original networks. Experiments on CIFAR-10, CIFAR-100 and ImageNet
demonstrate the effectiveness of our new perspective of network pruning
compared to traditional network pruning algorithms. Various searching and
knowledge transfer approaches are conducted to show the effectiveness of the
two components.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Dong_X/0/1/0/all/0/1">Xuanyi Dong</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_Y/0/1/0/all/0/1">Yi Yang</a>Meta-GNN: On Few-shot Node Classification in Graph Meta-learning. (arXiv:1905.09718v1 [cs.LG])http://arxiv.org/abs/1905.09718
<p>Meta-learning has received a tremendous recent attention as a possible
approach for mimicking human intelligence, i.e., acquiring new knowledge and
skills with little or even no demonstration. Most of the existing meta-learning
methods are proposed to tackle few-shot learning problems such as image and
text, in rather Euclidean domain. However, there are very few works applying
meta-learning to non-Euclidean domains, and the recently proposed graph neural
networks (GNNs) models do not perform effectively on graph few-shot learning
problems. Towards this, we propose a novel graph meta-learning framework --
Meta-GNN -- to tackle the few-shot node classification problem in graph
meta-learning settings. It obtains the prior knowledge of classifiers by
training on many similar few-shot learning tasks and then classifies the nodes
from new classes with only few labeled samples. Additionally, Meta-GNN is a
general model that can be straightforwardly incorporated into any existing
state-of-the-art GNN. Our experiments conducted on three benchmark datasets
demonstrate that our proposed approach not only improves the node
classification performance by a large margin on few-shot learning problems in
meta-learning paradigm, but also learns a more general and flexible model for
task adaption.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhou_F/0/1/0/all/0/1">Fan Zhou</a>, <a href="http://arxiv.org/find/cs/1/au:+Cao_C/0/1/0/all/0/1">Chengtai Cao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_K/0/1/0/all/0/1">Kunpeng Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Trajcevski_G/0/1/0/all/0/1">Goce Trajcevski</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhong_T/0/1/0/all/0/1">Ting Zhong</a>, <a href="http://arxiv.org/find/cs/1/au:+Geng_J/0/1/0/all/0/1">Ji Geng</a>Price of Dependence: Stochastic Submodular Maximization with Dependent Items. (arXiv:1905.09719v1 [cs.SI])http://arxiv.org/abs/1905.09719
<p>In this paper, we study the stochastic submodular maximization problem with
dependent items subject to a variety of packing constraints such as matroid and
knapsack constraints. The input of our problem is a finite set of items, and
each item is in a particular state from a set of possible states. After picking
an item, we are able to observe its state. We assume a monotone and submodular
utility function over items and states, and our objective is to select a group
of items adaptively so as to maximize the expected utility. Previous studies on
stochastic submodular maximization often assume that items' states are
independent, however, this assumption may not hold in general. This motivates
us to study the stochastic submodular maximization problem with dependent
items. We first introduce the concept of \emph{degree of independence} to
capture the degree to which one item's state is dependent on others'. Then we
propose a non-adaptive policy based on a modified continuous greedy algorithm
and show that its approximation ratio is $\alpha(1 - e^{-\frac{\kappa}{2} +
\frac{\kappa}{18m^2}} - \frac{\kappa + 2}{3m\kappa})$ where the value of
$\alpha$ is depending on the type of constraints, e.g., $\alpha=1$ for matroid
constraint, $\kappa$ is the degree of independence, e.g., $\kappa=1$ for
independent items, and $m$ is the number of items.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Tang_S/0/1/0/all/0/1">Shaojie Tang</a>Statistical Assertions for Validating Patterns and Finding Bugs in Quantum Programs. (arXiv:1905.09721v1 [quant-ph])http://arxiv.org/abs/1905.09721
<p>In support of the growing interest in quantum computing experimentation,
programmers need new tools to write quantum algorithms as program code.
Compared to debugging classical programs, debugging quantum programs is
difficult because programmers have limited ability to probe the internal states
of quantum programs; those states are difficult to interpret even when
observations exist; and programmers do not yet have guidelines for what to
check for when building quantum programs. In this work, we present quantum
program assertions based on statistical tests on classical observations. These
allow programmers to decide if a quantum program state matches its expected
value in one of classical, superposition, or entangled types of states. We
extend an existing quantum programming language with the ability to specify
quantum assertions, which our tool then checks in a quantum program simulator.
We use these assertions to debug three benchmark quantum programs in factoring,
search, and chemistry. We share what types of bugs are possible, and lay out a
strategy for using quantum programming patterns to place assertions and prevent
bugs.
</p>
<a href="http://arxiv.org/find/quant-ph/1/au:+Huang_Y/0/1/0/all/0/1">Yipeng Huang</a>, <a href="http://arxiv.org/find/quant-ph/1/au:+Martonosi_M/0/1/0/all/0/1">Margaret Martonosi</a>Verification and Synthesis of Symmetric Uni-Rings for Leads-To Properties. (arXiv:1905.09726v1 [cs.DC])http://arxiv.org/abs/1905.09726
<p>This paper investigates the verification and synthesis of parameterized
protocols that satisfy leadsto properties $R \leadsto Q$ on symmetric
unidirectional rings (a.k.a. uni-rings) of deterministic and constant-space
processes under no fairness and interleaving semantics, where $R$ and $Q$ are
global state predicates. First, we show that verifying $R \leadsto Q$ for
parameterized protocols on symmetric uni-rings is undecidable, even for
deterministic and constant-space processes, and conjunctive state predicates.
Then, we show that surprisingly synthesizing symmetric uni-ring protocols that
satisfy $R \leadsto Q$ is actually decidable. We identify necessary and
sufficient conditions for the decidability of synthesis based on which we
devise a sound and complete polynomial-time algorithm that takes the predicates
$R$ and $Q$, and automatically generates a parameterized protocol that
satisfies $R \leadsto Q$ for unbounded (but finite) ring sizes. Moreover, we
present some decidability results for cases where leadsto is required from
multiple distinct $R$ predicates to different $Q$ predicates. To demonstrate
the practicality of our synthesis method, we synthesize some parameterized
protocols, including agreement and parity protocols.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ebnenasir_A/0/1/0/all/0/1">Ali Ebnenasir</a>Deep Drone Racing: From Simulation to Reality with Domain Randomization. (arXiv:1905.09727v1 [cs.RO])http://arxiv.org/abs/1905.09727
<p>Dynamically changing environments, unreliable state estimation, and operation
under severe resource constraints are fundamental challenges for robotics,
which still limit the deployment of small autonomous drones. We address these
challenges in the context of autonomous, vision-based drone racing in dynamic
environments. A racing drone must traverse a track with possibly moving gates
at high speed. We enable this functionality by combining the performance of a
state-of-the-art path-planning and control system with the perceptual awareness
of a convolutional neural network (CNN). The CNN directly maps raw images to a
desired waypoint and speed. Given the CNN output, the planner generates a short
minimum-jerk trajectory segment that is tracked by a model-based controller to
actuate the drone towards the waypoint. The resulting modular system has
several desirable features: (i) it can run fully on-board, (ii) it does not
require globally consistent state estimation, and (iii) it is both platform and
domain independent. We extensively test the precision and robustness of our
system, both in simulation and on a physical platform. In both domains, our
method significantly outperforms the prior state of the art. In order to
understand the limits of our approach, we additionally compare against
professional human drone pilots with different skill levels.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Loquercio_A/0/1/0/all/0/1">Antonio Loquercio</a>, <a href="http://arxiv.org/find/cs/1/au:+Kaufmann_E/0/1/0/all/0/1">Elia Kaufmann</a>, <a href="http://arxiv.org/find/cs/1/au:+Ranftl_R/0/1/0/all/0/1">Ren&#xe9; Ranftl</a>, <a href="http://arxiv.org/find/cs/1/au:+Dosovitskiy_A/0/1/0/all/0/1">Alexey Dosovitskiy</a>, <a href="http://arxiv.org/find/cs/1/au:+Koltun_V/0/1/0/all/0/1">Vladlen Koltun</a>, <a href="http://arxiv.org/find/cs/1/au:+Scaramuzza_D/0/1/0/all/0/1">Davide Scaramuzza</a>On modelling the emergence of logical thinking. (arXiv:1905.09730v1 [cs.AI])http://arxiv.org/abs/1905.09730
<p>Recent progress in machine learning techniques have revived interest in
building artificial general intelligence using these particular tools. There
has been a tremendous success in applying them for narrow intellectual tasks
such as pattern recognition, natural language processing and playing Go. The
latter application vastly outperforms the strongest human player in recent
years. However, these tasks are formalized by people in such ways that it has
become "easy" for automated recipes to find better solutions than humans do. In
the sense of John Searle's Chinese Room Argument, the computer playing Go does
not actually understand anything from the game. Thinking like a human mind
requires to go beyond the curve fitting paradigm of current systems. There is a
fundamental limit to what they can achieve currently as only very specific
problem formalization can increase their performances in particular tasks. In
this paper, we argue than one of the most important aspects of the human mind
is its capacity for logical thinking, which gives rise to many intellectual
expressions that differentiate us from animal brains. We propose to model the
emergence of logical thinking based on Piaget's theory of cognitive
development.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ivan_C/0/1/0/all/0/1">Cristian Ivan</a>, <a href="http://arxiv.org/find/cs/1/au:+Indurkhya_B/0/1/0/all/0/1">Bipin Indurkhya</a>Digital Normativity: A challenge for human subjectivization and free will. (arXiv:1905.09735v1 [cs.CY])http://arxiv.org/abs/1905.09735
<p>Over the past decade, artificial intelligence has demonstrated its efficiency
in many different applications and a huge number of algorithms have become
central and ubiquitous in our life. Their growing interest is essentially based
on their capability to synthesize and process large amounts of data, and to
help humans making decisions in a world of increasing complexity. Yet, the
effectiveness of algorithms in bringing more and more relevant recommendations
to humans may start to compete with human-alone decisions based on values other
than pure efficacy. Here, we examine this tension in light of the emergence of
several forms of digital normativity, and analyze how this normative role of AI
may influence the ability of humans to remain subject of their life. The advent
of AI technology imposes a need to achieve a balance between concrete material
progress and progress of the mind to avoid any form of servitude. It has become
essential that an ethical reflection accompany the current developments of
intelligent algorithms beyond the sole question of their social acceptability.
Such reflection should be anchored where AI technologies are being developed as
well as in educational programs where their implications can be explained.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Fourneret_E/0/1/0/all/0/1">&#xc9;ric Fourneret</a>, <a href="http://arxiv.org/find/cs/1/au:+Yvert_B/0/1/0/all/0/1">Blaise Yvert</a>Cross-chain Deals and Adversarial Commerce. (arXiv:1905.09743v1 [cs.DC])http://arxiv.org/abs/1905.09743
<p>Modern distributed data management systems face a new challenge: how can
autonomous, mutually-distrusting parties cooperate safely and effectively?
Addressing this challenge brings up questions familiar from classical
distributed systems: how to combine multiple steps into a single atomic action,
how to recover from failures, and how to synchronize concurrent access to data.
Nevertheless, each of these issues requires rethinking when participants are
autonomous and potentially adversarial.
</p>
<p>We propose the notion of a \emph{cross-chain deal}, a new way to structure
complex distributed computations in an adversarial setting. Deals are inspired
by classical atomic transactions, but are necessarily different, in important
ways, to accommodate the decentralized and untrusting nature of the exchange.
We describe novel safety and liveness properties, along with two alternative
protocols for implementing cross-chain deals in a system of independent
blockchain ledgers. One protocol, based on synchronous communication, is fully
decentralized, while the other, based on eventually-synchronous communication,
necessarily requires stronger trust assumptions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Herlihy_M/0/1/0/all/0/1">Maurice Herlihy</a>, <a href="http://arxiv.org/find/cs/1/au:+Liskov_B/0/1/0/all/0/1">Barbara Liskov</a>, <a href="http://arxiv.org/find/cs/1/au:+Shrira_L/0/1/0/all/0/1">Liuba Shrira</a>A consistent and comprehensive computational approach for general Fluid-Structure-Contact Interaction problems. (arXiv:1905.09744v1 [cs.CE])http://arxiv.org/abs/1905.09744
<p>We present a consistent approach that allows to solve challenging general
nonlinear fluid-structure-contact interaction (FSCI) problems. The underlying
continuous formulation includes both "no-slip" fluid-structure interaction as
well as frictionless contact between multiple elastic bodies. The respective
interface conditions in normal and tangential orientation and especially the
role of the fluid stress within the region of closed contact are discussed for
the general problem of FSCI. To ensure continuity of the tangential constraints
from no-slip to frictionless contact, a transition is enabled by using the
general Navier condition with varying slip length. Moreover, the fluid stress
in the contact zone is obtained by an extension approach as it plays a crucial
role for the lift-off behavior of contacting bodies. With the given continuity
of the spatially continuous formulation, continuity of the discrete problem
(which is essential for the convergence of Newton's method) is reached
naturally. As topological changes of the fluid domain are an inherent challenge
in FSCI configurations, a non-interface fitted Cut Finite Element Method
(CutFEM) is applied to discretize the fluid domain. All interface conditions,
that is the `no-slip' FSI, the general Navier condition, and frictionless
contact are incorporated using Nitsche based methods, thus retaining the
continuity and consistency of the model. To account for the strong interaction
between the fluid and solid discretization, the overall coupled discrete system
is solved monolithically. Numerical examples of varying complexity are
presented to corroborate the developments.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ager_C/0/1/0/all/0/1">Christoph Ager</a>, <a href="http://arxiv.org/find/cs/1/au:+Seitz_A/0/1/0/all/0/1">Alexander Seitz</a>, <a href="http://arxiv.org/find/cs/1/au:+Wall_W/0/1/0/all/0/1">Wolfgang A. Wall</a>A Predictive Model for Steady-State Multiphase Pipe Flow: Machine Learning on Lab Data. (arXiv:1905.09746v1 [physics.data-an])http://arxiv.org/abs/1905.09746
<p>Engineering simulators used for steady-state multiphase pipe flows are
commonly utilized to predict pressure drop. Such simulators are typically based
on either empirical correlations or first-principles mechanistic models. The
simulators allow evaluating the pressure drop in multiphase pipe flow with
acceptable accuracy. However, the only shortcoming of these correlations and
mechanistic models is their applicability. In order to extend the applicability
and the accuracy of the existing accessible methods, a method of pressure drop
calculation in the pipeline is proposed. The method is based on well
segmentation and calculation of the pressure gradient in each segment using
three surrogate models based on Machine Learning algorithms trained on a
representative lab data set from the open literature. The first model predicts
the value of a liquid holdup in the segment, the second one determines the flow
pattern, and the third one is used to estimate the pressure gradient. To build
these models, several ML algorithms are trained such as Random Forest, Gradient
Boosting Decision Trees, Support Vector Machine, and Artificial Neural Network,
and their predictive abilities are cross-compared. The proposed method for
pressure gradient calculation yields $R^2 = 0.95$ by using the Gradient
Boosting algorithm as compared with $R^2 = 0.92$ in case of Mukherjee and Brill
correlation and $R^2 = 0.91$ when a combination of Ansari and Xiao mechanistic
models is utilized. The method for pressure drop prediction is also validated
on three real field cases. Validation indicates that the proposed model yields
the following coefficients of determination: $R^2 = 0.806, 0.815$ and 0.99 as
compared with the highest values obtained by commonly used techniques: $R^2 =
0.82$ (Beggs and Brill correlation), $R^2 = 0.823$ (Mukherjee and Brill
correlation) and $R^2 = 0.98$ (Beggs and Brill correlation).
</p>
<a href="http://arxiv.org/find/physics/1/au:+Kanin_E/0/1/0/all/0/1">Evgenii Kanin</a>, <a href="http://arxiv.org/find/physics/1/au:+Osiptsov_A/0/1/0/all/0/1">Andrei Osiptsov</a>, <a href="http://arxiv.org/find/physics/1/au:+Vainshtein_A/0/1/0/all/0/1">Albert Vainshtein</a>, <a href="http://arxiv.org/find/physics/1/au:+Burnaev_E/0/1/0/all/0/1">Evgeny Burnaev</a>Adversarially Robust Distillation. (arXiv:1905.09747v1 [cs.LG])http://arxiv.org/abs/1905.09747
<p>Knowledge distillation is effective for producing small high-performance
neural networks for classification, but these small networks are vulnerable to
adversarial attacks. We first study how robustness transfers from robust
teacher to student network during knowledge distillation. We find that a large
amount of robustness may be inherited by the student even when distilled on
only clean images. Second, we introduce Adversarially Robust Distillation (ARD)
for distilling robustness onto small student networks. ARD is an analogue of
adversarial training but for distillation. In addition to producing small
models with high test accuracy like conventional distillation, ARD also passes
the superior robustness of large networks onto the student. In our experiments,
we find that ARD student models decisively outperform adversarially trained
networks of identical architecture on robust accuracy. Finally, we adapt recent
fast adversarial training methods to ARD for accelerated robust distillation.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Goldblum_M/0/1/0/all/0/1">Micah Goldblum</a>, <a href="http://arxiv.org/find/cs/1/au:+Fowl_L/0/1/0/all/0/1">Liam Fowl</a>, <a href="http://arxiv.org/find/cs/1/au:+Feizi_S/0/1/0/all/0/1">Soheil Feizi</a>, <a href="http://arxiv.org/find/cs/1/au:+Goldstein_T/0/1/0/all/0/1">Tom Goldstein</a>Approximation schemes for the generalized extensible bin packing problem. (arXiv:1905.09750v1 [cs.DS])http://arxiv.org/abs/1905.09750
<p>We present a new generalization of the extensible bin packing with unequal
bin sizes problem. In our generalization the cost of exceeding the bin size
depends on the index of the bin and not only on the amount in which the size of
the bin is exceeded. This generalization does not satisfy the assumptions on
the cost function that were used to present the existing polynomial time
approximation scheme (PTAS) for the extensible bin packing with unequal bin
sizes problem. In this work, we show the existence of an efficient PTAS (EPTAS)
for this new generalization and thus in particular we improve the earlier PTAS
for the extensible bin packing with unequal bin sizes problem into an EPTAS.
Our new scheme is based on using the shifting technique followed by a solution
of polynomial number of $n$-fold programming instances. In addition, we present
an asymptotic fully polynomial time approximation scheme (AFPTAS) for the
related bin packing type variant of the problem.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Levin_A/0/1/0/all/0/1">Asaf Levin</a>An embedded--hybridized discontinuous Galerkin method for the coupled Stokes--Darcy system. (arXiv:1905.09753v1 [math.NA])http://arxiv.org/abs/1905.09753
<p>We introduce an embedded--hybridized discontinuous Galerkin (EDG--HDG) method
for the coupled Stokes--Darcy system. This EDG--HDG method is a pointwise
mass-conserving discretization resulting in a divergence-conforming velocity
field on the whole domain. In the proposed scheme, coupling between the Stokes
and Darcy domains is achieved naturally through the EDG--HDG facet variables.
\emph{A priori} error analysis shows optimal convergence rates, and that the
velocity error does not depend on the pressure. The error analysis is verified
through numerical examples on unstructured grids for different orders of
polynomial approximation.
</p>
<a href="http://arxiv.org/find/math/1/au:+Cesmelioglu_A/0/1/0/all/0/1">Aycil Cesmelioglu</a>, <a href="http://arxiv.org/find/math/1/au:+Rhebergen_S/0/1/0/all/0/1">Sander Rhebergen</a>, <a href="http://arxiv.org/find/math/1/au:+Wells_G/0/1/0/all/0/1">Garth N. Wells</a>A Perceptual Weighting Filter Loss for DNN Training in Speech Enhancement. (arXiv:1905.09754v1 [eess.AS])http://arxiv.org/abs/1905.09754
<p>Single-channel speech enhancement with deep neural networks (DNNs) has shown
promising performance and is thus intensively being studied. In this paper,
instead of applying the mean squared error (MSE) as the loss function during
DNN training for speech enhancement, we design a perceptual weighting filter
loss motivated by the weighting filter as it is employed in
analysis-by-synthesis speech coding, e.g., in code-excited linear prediction
(CELP). The experimental results show that the proposed simple loss function
improves the speech enhancement performance compared to a reference DNN with
MSE loss in terms of perceptual quality and noise attenuation. The proposed
loss function can be advantageously applied to an existing DNN-based speech
enhancement system, without modification of the DNN topology for speech
enhancement. The source code for the proposed approach is made available.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Zhao_Z/0/1/0/all/0/1">Ziyue Zhao</a>, <a href="http://arxiv.org/find/eess/1/au:+Elshamy_S/0/1/0/all/0/1">Samy Elshamy</a>, <a href="http://arxiv.org/find/eess/1/au:+Fingscheidt_T/0/1/0/all/0/1">Tim Fingscheidt</a>Misspelling Oblivious Word Embeddings. (arXiv:1905.09755v1 [cs.CL])http://arxiv.org/abs/1905.09755
<p>In this paper we present a method to learn word embeddings that are resilient
to misspellings. Existing word embeddings have limited applicability to
malformed texts, which contain a non-negligible amount of out-of-vocabulary
words. We propose a method combining FastText with subwords and a supervised
task of learning misspelling patterns. In our method, misspellings of each word
are embedded close to their correct variants. We train these embeddings on a
new dataset we are releasing publicly. Finally, we experimentally show the
advantages of this approach on both intrinsic and extrinsic NLP tasks using
public test sets.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Edizel_B/0/1/0/all/0/1">Bora Edizel</a>, <a href="http://arxiv.org/find/cs/1/au:+Piktus_A/0/1/0/all/0/1">Aleksandra Piktus</a>, <a href="http://arxiv.org/find/cs/1/au:+Bojanowski_P/0/1/0/all/0/1">Piotr Bojanowski</a>, <a href="http://arxiv.org/find/cs/1/au:+Ferreira_R/0/1/0/all/0/1">Rui Ferreira</a>, <a href="http://arxiv.org/find/cs/1/au:+Grave_E/0/1/0/all/0/1">Edouard Grave</a>, <a href="http://arxiv.org/find/cs/1/au:+Silvestri_F/0/1/0/all/0/1">Fabrizio Silvestri</a>Network Density of States. (arXiv:1905.09758v1 [cs.SI])http://arxiv.org/abs/1905.09758
<p>Spectral analysis connects graph structure to the eigenvalues and
eigenvectors of associated matrices. Much of spectral graph theory descends
directly from spectral geometry, the study of differentiable manifolds through
the spectra of associated differential operators. But the translation from
spectral geometry to spectral graph theory has largely focused on results
involving only a few extreme eigenvalues and their associated eigenvalues.
Unlike in geometry, the study of graphs through the overall distribution of
eigenvalues - the spectral density - is largely limited to simple random graph
models. The interior of the spectrum of real-world graphs remains largely
unexplored, difficult to compute and to interpret.
</p>
<p>In this paper, we delve into the heart of spectral densities of real-world
graphs. We borrow tools developed in condensed matter physics, and add novel
adaptations to handle the spectral signatures of common graph motifs. The
resulting methods are highly efficient, as we illustrate by computing spectral
densities for graphs with over a billion edges on a single compute node. Beyond
providing visually compelling fingerprints of graphs, we show how the
estimation of spectral densities facilitates the computation of many common
centrality measures, and use spectral densities to estimate meaningful
information about graph structure that cannot be inferred from the extremal
eigenpairs alone.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Dong_K/0/1/0/all/0/1">Kun Dong</a>, <a href="http://arxiv.org/find/cs/1/au:+Benson_A/0/1/0/all/0/1">Austin R. Benson</a>, <a href="http://arxiv.org/find/cs/1/au:+Bindel_D/0/1/0/all/0/1">David Bindel</a>Design Dimensions for Software Certification: A Grounded Analysis. (arXiv:1905.09760v1 [cs.SE])http://arxiv.org/abs/1905.09760
<p>In many domains, software systems cannot be deployed until authorities judge
them fit for use in an intended operating environment. Certification standards
and processes have been devised and deployed to regulate operations of software
systems and prevent their failures. However, practitioners are often
unsatisfied with the efficiency and value proposition of certification efforts.
In this study, we compare two certification standards, Common Criteria and
DO-178C, and collect insights from literature and from interviews with
subject-matter experts to identify design options relevant to the design of
standards. The results of the comparison of certification efforts---leading to
the identification of design dimensions that affect their quality---serve as a
framework to guide the comparison, creation, and revision of certification
standards and processes. This paper puts software engineering research in
context and discusses key issues around process and quality assurance and
includes observations from industry about relevant topics such as
recertification, timely evaluations, but also technical discussions around
model-driven approaches and formal methods. Our initial characterization of the
design space of certification efforts can be used to inform technical
discussions and to influence the directions of new or existing certification
efforts. Practitioners, technical commissions, and government can directly
benefit from our analytical framework.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ferreira_G/0/1/0/all/0/1">Gabriel Ferreira</a>, <a href="http://arxiv.org/find/cs/1/au:+Kastner_C/0/1/0/all/0/1">Christian K&#xe4;stner</a>, <a href="http://arxiv.org/find/cs/1/au:+Sunshine_J/0/1/0/all/0/1">Joshua Sunshine</a>, <a href="http://arxiv.org/find/cs/1/au:+Apel_S/0/1/0/all/0/1">Sven Apel</a>, <a href="http://arxiv.org/find/cs/1/au:+Scherlis_W/0/1/0/all/0/1">William Scherlis</a>An Efficient Approach for Super and Nested Term Indexing and Retrieval. (arXiv:1905.09761v1 [cs.DS])http://arxiv.org/abs/1905.09761
<p>This paper describes a new approach, called Terminological Bucket Indexing
(TBI), for efficient indexing and retrieval of both nested and super terms
using a single method. We propose a hybrid data structure for facilitating
faster indexing building. An evaluation of our approach with respect to widely
used existing approaches on several publicly available dataset is provided.
Compared to Trie based approaches, TBI provides comparable performance on
nested term retrieval and far superior performance on super term retrieval.
Compared to traditional hash table, TBI needs 80\% less time for indexing.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chowdhury_M/0/1/0/all/0/1">Md Faisal Mahbub Chowdhury</a>, <a href="http://arxiv.org/find/cs/1/au:+Farrell_R/0/1/0/all/0/1">Robert Farrell</a>Geometric Laplacian Eigenmap Embedding. (arXiv:1905.09763v1 [cs.LG])http://arxiv.org/abs/1905.09763
<p>Graph embedding seeks to build a low-dimensional representation of a graph G.
This low-dimensional representation is then used for various downstream tasks.
One popular approach is Laplacian Eigenmaps, which constructs a graph embedding
based on the spectral properties of the Laplacian matrix of G. The intuition
behind it, and many other embedding techniques, is that the embedding of a
graph must respect node similarity: similar nodes must have embeddings that are
close to one another. Here, we dispose of this distance-minimization
assumption. Instead, we use the Laplacian matrix to find an embedding with
geometric properties instead of spectral ones, by leveraging the so-called
simplex geometry of G. We introduce a new approach, Geometric Laplacian
Eigenmap Embedding (or GLEE for short), and demonstrate that it outperforms
various other techniques (including Laplacian Eigenmaps) in the tasks of graph
reconstruction and link prediction.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Torres_L/0/1/0/all/0/1">Leo Torres</a>, <a href="http://arxiv.org/find/cs/1/au:+Chan_K/0/1/0/all/0/1">Kevin S Chan</a>, <a href="http://arxiv.org/find/cs/1/au:+Eliassi_Rad_T/0/1/0/all/0/1">Tina Eliassi-Rad</a>Workflow Design Analysis for High Resolution Satellite Image Analysis. (arXiv:1905.09766v1 [cs.DC])http://arxiv.org/abs/1905.09766
<p>Ecological sciences are using imagery from a variety of sources to monitor
and survey populations and ecosystems. Very High Resolution (VHR) satellite
imagery provide an effective dataset for large scale surveys. Convolutional
Neural Networks have successfully been employed to analyze such imagery and
detect large animals. As the datasets increase in volume, O(TB), and number of
images, O(1k), utilizing High Performance Computing (HPC) resources becomes
necessary. In this paper, we investigate a task-parallel data-driven workflows
design to support imagery analysis pipelines with heterogeneous tasks on HPC.
We analyze the capabilities of each design when processing a dataset of 3,000
VHR satellite images for a total of 4~TB. We experimentally model the execution
time of the tasks of the image processing pipeline. We perform experiments to
characterize the resource utilization, total time to completion, and overheads
of each design. Based on the model, overhead and utilization analysis, we show
which design approach to is best suited in scientific pipelines with similar
characteristics.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Paraskevakos_I/0/1/0/all/0/1">Ioannis Paraskevakos</a>, <a href="http://arxiv.org/find/cs/1/au:+Turrili_M/0/1/0/all/0/1">Matteo Turrili</a>, <a href="http://arxiv.org/find/cs/1/au:+Goncalves_B/0/1/0/all/0/1">Bento Collares Gon&#xe7;alves</a>, <a href="http://arxiv.org/find/cs/1/au:+Lynch_H/0/1/0/all/0/1">Heather J. Lynch</a>, <a href="http://arxiv.org/find/cs/1/au:+Jha_S/0/1/0/all/0/1">Shantenu Jha</a>Zero-shot Knowledge Transfer via Adversarial Belief Matching. (arXiv:1905.09768v1 [cs.LG])http://arxiv.org/abs/1905.09768
<p>Performing knowledge transfer from a large teacher network to a smaller
student is a popular task in modern deep learning applications. However, due to
growing dataset sizes and stricter privacy regulations, it is increasingly
common not to have access to the data that was used to train the teacher. We
propose a novel method which trains a student to match the predictions of its
teacher without using any data or metadata. We achieve this by training an
adversarial generator to search for images on which the student poorly matches
the teacher, and then using them to train the student. Our resulting student
closely approximates its teacher for simple datasets like SVHN, and on CIFAR10
we improve on the state-of-the-art for few-shot distillation (with 100 images
per class), despite using no data. Finally, we also propose a metric to
quantify the degree of belief matching between teacher and student in the
vicinity of decision boundaries, and observe a significantly higher match
between our zero-shot student and the teacher, than between a student distilled
with real data and the teacher. Code available at:
https://github.com/polo5/ZeroShotKnowledgeTransfer
</p>
<a href="http://arxiv.org/find/cs/1/au:+Micaelli_P/0/1/0/all/0/1">Paul Micaelli</a>, <a href="http://arxiv.org/find/cs/1/au:+Storkey_A/0/1/0/all/0/1">Amos Storkey</a>Multi-Service Mobile Traffic Forecasting via Convolutional Long Short-Term Memories. (arXiv:1905.09771v1 [cs.LG])http://arxiv.org/abs/1905.09771
<p>Network slicing is increasingly used to partition network infrastructure
between different mobile services. Precise service-wise mobile traffic
forecasting becomes essential in this context, as mobile operators seek to
pre-allocate resources to each slice in advance, to meet the distinct
requirements of individual services. This paper attacks the problem of
multi-service mobile traffic forecasting using a sequence-to-sequence (S2S)
learning paradigm and convolutional long short-term memories (ConvLSTMs). The
proposed architecture is designed so as to effectively extract complex
spatiotemporal features of mobile network traffic and predict with high
accuracy the future demands for individual services at city scale. We conduct
experiments on a mobile traffic dataset collected in a large European
metropolis, demonstrating that the proposed S2S-ConvLSTM can forecast the
mobile traffic volume produced by tens of different services in advance of up
to one hour, by just using measurements taken during the past hour. In
particular, our solution achieves mean absolute errors (MAE) at antenna level
that are below 13KBps, outperforming other deep learning approaches by up to
31.2%.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_C/0/1/0/all/0/1">Chaoyun Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Fiore_M/0/1/0/all/0/1">Marco Fiore</a>, <a href="http://arxiv.org/find/cs/1/au:+Patras_P/0/1/0/all/0/1">Paul Patras</a>What is the Entropy of a Social Organization?. (arXiv:1905.09772v1 [physics.soc-ph])http://arxiv.org/abs/1905.09772
<p>We quantify a social organization's potentiality, that is its ability to
attain different configurations. The organization is represented as a network
in which nodes correspond to individuals and (multi-)edges to their multiple
interactions. Attainable configurations are treated as realizations from a
network ensemble. To encode interaction preferences between individuals, we
choose the generalized hypergeometric ensemble of random graphs, which is
described by a closed-form probability distribution. From this distribution we
calculate Shannon entropy as a measure of potentiality. This allows us to
compare different organizations as well different stages in the development of
a given organization. The feasibility of the approach is demonstrated using
data from 3 empirical and 2 synthetic systems.
</p>
<a href="http://arxiv.org/find/physics/1/au:+Zingg_C/0/1/0/all/0/1">Christian Zingg</a>, <a href="http://arxiv.org/find/physics/1/au:+Casiraghi_G/0/1/0/all/0/1">Giona Casiraghi</a>, <a href="http://arxiv.org/find/physics/1/au:+Vaccario_G/0/1/0/all/0/1">Giacomo Vaccario</a>, <a href="http://arxiv.org/find/physics/1/au:+Schweitzer_F/0/1/0/all/0/1">Frank Schweitzer</a>Speech2Face: Learning the Face Behind a Voice. (arXiv:1905.09773v1 [cs.CV])http://arxiv.org/abs/1905.09773
<p>How much can we infer about a person's looks from the way they speak? In this
paper, we study the task of reconstructing a facial image of a person from a
short audio recording of that person speaking. We design and train a deep
neural network to perform this task using millions of natural Internet/YouTube
videos of people speaking. During training, our model learns voice-face
correlations that allow it to produce images that capture various physical
attributes of the speakers such as age, gender and ethnicity. This is done in a
self-supervised manner, by utilizing the natural co-occurrence of faces and
speech in Internet videos, without the need to model attributes explicitly. We
evaluate and numerically quantify how--and in what manner--our Speech2Face
reconstructions, obtained directly from audio, resemble the true face images of
the speakers.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Oh_T/0/1/0/all/0/1">Tae-Hyun Oh</a>, <a href="http://arxiv.org/find/cs/1/au:+Dekel_T/0/1/0/all/0/1">Tali Dekel</a>, <a href="http://arxiv.org/find/cs/1/au:+Kim_C/0/1/0/all/0/1">Changil Kim</a>, <a href="http://arxiv.org/find/cs/1/au:+Mosseri_I/0/1/0/all/0/1">Inbar Mosseri</a>, <a href="http://arxiv.org/find/cs/1/au:+Freeman_W/0/1/0/all/0/1">William T. Freeman</a>, <a href="http://arxiv.org/find/cs/1/au:+Rubinstein_M/0/1/0/all/0/1">Michael Rubinstein</a>, <a href="http://arxiv.org/find/cs/1/au:+Matusik_W/0/1/0/all/0/1">Wojciech Matusik</a>A Smoothness Energy without Boundary Distortion for Curved Surfaces. (arXiv:1905.09777v1 [cs.GR])http://arxiv.org/abs/1905.09777
<p>Current quadratic smoothness energies for curved surfaces either exhibit
distortions near the boundary due to zero Neumann boundary conditions, or they
do not correctly account for intrinsic curvature, which leads to
unnatural-looking behavior away from the boundary. This leads to an unfortunate
trade-off: one can either have natural behavior in the interior, or a
distortion-free result at the boundary, but not both. We introduce a
generalized Hessian energy for curved surfaces. This energy features the curved
Hessian of functions on manifolds as well as an additional curvature term which
results from applying the Weitzenbock identity. Its minimizers solve the
Laplace-Beltrami biharmonic equation, correctly accounting for intrinsic
curvature, leading to natural-looking isolines. On the boundary, minimizers are
as-linear-as-possible, which reduces the distortion of isolines at the
boundary. We also provide an implementation that enables the use of the Hessian
energy for applications on curved surfaces for which current quadratic
smoothness energies do not produce satisfying results, and observe convergence
in our experiments.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Stein_O/0/1/0/all/0/1">Oded Stein</a>, <a href="http://arxiv.org/find/cs/1/au:+Jacobson_A/0/1/0/all/0/1">Alec Jacobson</a>, <a href="http://arxiv.org/find/cs/1/au:+Wardetzky_M/0/1/0/all/0/1">Max Wardetzky</a>, <a href="http://arxiv.org/find/cs/1/au:+Grinspun_E/0/1/0/all/0/1">Eitan Grinspun</a>Privacy-Preserving Obfuscation of Critical Infrastructure Networks. (arXiv:1905.09778v1 [cs.CR])http://arxiv.org/abs/1905.09778
<p>The paper studies how to release data about a critical infrastructure network
(e.g., the power network or a transportation network) without disclosing
sensitive information that can be exploited by malevolent agents, while
preserving the realism of the network. It proposes a novel obfuscation
mechanism that combines several privacy-preserving building blocks with a
bi-level optimization model to significantly improve accuracy. The obfuscation
is evaluated for both realism and privacy properties on real energy and
transportation networks. Experimental results show the obfuscation mechanism
substantially reduces the potential damage of an attack exploiting the released
data to harm the real network.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Fioretto_F/0/1/0/all/0/1">Ferdinando Fioretto</a>, <a href="http://arxiv.org/find/cs/1/au:+Mak_T/0/1/0/all/0/1">Terrence W.K. Mak</a>, <a href="http://arxiv.org/find/cs/1/au:+Hentenryck_P/0/1/0/all/0/1">Pascal Van Hentenryck</a>Bayesian Optimization over Sets. (arXiv:1905.09780v1 [stat.ML])http://arxiv.org/abs/1905.09780
<p>We propose a Bayesian optimization method over sets, to minimize a black-box
function that can take a set as single input. Because set inputs are
permutation-invariant and variable-length, traditional Gaussian process-based
Bayesian optimization strategies which assume vector inputs can fall short. To
address this, we develop a Bayesian optimization method with \emph{set kernel}
that is used to build surrogate functions. This kernel accumulates similarity
over set elements to enforce permutation-invariance and permit sets of variable
size, but this comes at a greater computational cost. To reduce this burden, we
propose a more efficient probabilistic approximation which we prove is still
positive definite and is an unbiased estimator of the true set kernel. Finally,
we present several numerical experiments which demonstrate that our method
outperforms other methods in various applications.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Kim_J/0/1/0/all/0/1">Jungtaek Kim</a>, <a href="http://arxiv.org/find/stat/1/au:+McCourt_M/0/1/0/all/0/1">Michael McCourt</a>, <a href="http://arxiv.org/find/stat/1/au:+You_T/0/1/0/all/0/1">Tackgeun You</a>, <a href="http://arxiv.org/find/stat/1/au:+Kim_S/0/1/0/all/0/1">Saehoon Kim</a>, <a href="http://arxiv.org/find/stat/1/au:+Choi_S/0/1/0/all/0/1">Seungjin Choi</a>Positional Encoding by Robots with Non-Rigid Movements. (arXiv:1905.09786v1 [cs.DC])http://arxiv.org/abs/1905.09786
<p>Consider a set of autonomous computational entities, called \emph{robots},
operating inside a polygonal enclosure (possibly with holes), that have to
perform some collaborative tasks. The boundary of the polygon obstructs both
visibility and mobility of a robot. Since the polygon is initially unknown to
the robots, the natural approach is to first explore and construct a map of the
polygon. For this, the robots need an unlimited amount of persistent memory to
store the snapshots taken from different points inside the polygon. However, it
has been shown by Di Luna et al. [DISC 2017] that map construction can be done
even by oblivious robots by employing a positional encoding strategy where a
robot carefully positions itself inside the polygon to encode information in
the binary representation of its distance from the closest polygon vertex. Of
course, to execute this strategy, it is crucial for the robots to make accurate
movements. In this paper, we address the question whether this technique can be
implemented even when the movements of the robots are unpredictable in the
sense that the robot can be stopped by the adversary during its movement before
reaching its destination. However, there exists a constant $\delta &gt; 0$,
unknown to the robot, such that the robot can always reach its destination if
it has to move by no more than $\delta$ amount. This model is known in
literature as \emph{non-rigid} movement. We give a partial answer to the
question in the affirmative by presenting a map construction algorithm for
robots with non-rigid movement, but having $O(1)$ bits of persistent memory and
ability to make circular moves.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Bose_K/0/1/0/all/0/1">Kaustav Bose</a>, <a href="http://arxiv.org/find/cs/1/au:+Adhikary_R/0/1/0/all/0/1">Ranendu Adhikary</a>, <a href="http://arxiv.org/find/cs/1/au:+Kundu_M/0/1/0/all/0/1">Manash Kumar Kundu</a>, <a href="http://arxiv.org/find/cs/1/au:+Sau_B/0/1/0/all/0/1">Buddhadeb Sau</a>Multi-Sample Dropout for Accelerated Training and Better Generalization. (arXiv:1905.09788v1 [cs.NE])http://arxiv.org/abs/1905.09788
<p>Dropout is a simple but efficient regularization technique for achieving
better generalization of deep neural networks (DNNs); hence it is widely used
in tasks based on DNNs. During training, dropout randomly discards a portion of
the neurons to avoid overfitting. This paper presents an enhanced dropout
technique, which we call multi-sample dropout, for both accelerating training
and improving generalization over the original dropout. The original dropout
creates a randomly selected subset (called a dropout sample) from the input in
each training iteration while the multi-sample dropout creates multiple dropout
samples. The loss is calculated for each sample, and then the sample losses are
averaged to obtain the final loss. This technique can be easily implemented
without implementing a new operator by duplicating a part of the network after
the dropout layer while sharing the weights among the duplicated fully
connected layers. Experimental results showed that multi-sample dropout
significantly accelerates training by reducing the number of iterations until
convergence for image classification tasks using the ImageNet, CIFAR-10,
CIFAR-100, and SVHN datasets. Multi-sample dropout does not significantly
increase computation cost per iteration because most of the computation time is
consumed in the convolution layers before the dropout layer, which are not
duplicated. Experiments also showed that networks trained using multi-sample
dropout achieved lower error rates and losses for both the training set and
validation set.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Inoue_H/0/1/0/all/0/1">Hiroshi Inoue</a>Multi-relational Poincar\'e Graph Embeddings. (arXiv:1905.09791v1 [cs.LG])http://arxiv.org/abs/1905.09791
<p>Hyperbolic embeddings have recently gained attention in machine learning due
to their ability to represent hierarchical data more accurately and succinctly
than their Euclidean analogues. However, multi-relational knowledge graphs
often exhibit multiple simultaneous hierarchies, which current hyperbolic
models do not capture. To address this, we propose a model that embeds
multi-relational graph data in the Poincar\'e ball model of hyperbolic space.
Our Multi-Relational Poincar\'e model (MuRP) learns relation-specific
parameters to transform entity embeddings by M\"obius matrix-vector
multiplication and M\"obius addition. Experiments on the hierarchical WN18RR
knowledge graph show that our multi-relational Poincar\'e embeddings outperform
their Euclidean counterpart and existing embedding methods on the link
prediction task, particularly at lower dimensionality.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Balazevic_I/0/1/0/all/0/1">Ivana Bala&#x17e;evi&#x107;</a>, <a href="http://arxiv.org/find/cs/1/au:+Allen_C/0/1/0/all/0/1">Carl Allen</a>, <a href="http://arxiv.org/find/cs/1/au:+Hospedales_T/0/1/0/all/0/1">Timothy Hospedales</a>Evaluating the Effects of Control Surfaces Failure on the GTM. (arXiv:1905.09794v1 [cs.SY])http://arxiv.org/abs/1905.09794
<p>Despite the advances in aircraft guidance and control systems technology,
Loss of Control remains as the main cause of the fatal accidents of large
transport aircraft. Loss of Control is defined as excursion beyond the
allowable flight envelope and is often a consequence of upset condition if
improper maneuver is implemented by the pilot. Hence, extensive research in
recent years has focused on improving the current fault tolerant control
systems and developing new strategies for loss of control prevention and
recovery systems. However, success of such systems requires the perception of
the damaged aircraft's dynamic behavior and performance, and understanding of
its new flight envelope. This paper provides a comprehensive understanding of
lateral control surfaces' failure effect on the NASA Generic Transport Model's
maneuvering flight envelope; which is a set of attainable steady state
maneuvers herein referred to as trim points. The study utilizes a massive
database of the Generic Transport Model's high-fidelity maneuvering flight
envelopes computed for the unimpaired case and wide ranges of aileron and
rudder failure cases at different flight conditions. Flight envelope boundary
is rigorously investigated and the key parameters confining the trim points at
different boundary sections are identified. Trend analysis of the impaired
flight envelopes and the corresponding limiting factors is performed which
demonstrates the effect of various failure degrees on the remaining feasible
trim points. Results of the post-failure analysis can be employed in emergency
path planning and have potential uses in the development of aircraft resilient
control and upset recovery systems.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Norouzi_R/0/1/0/all/0/1">Ramin Norouzi</a>, <a href="http://arxiv.org/find/cs/1/au:+Kosari_A/0/1/0/all/0/1">Amirreza Kosari</a>, <a href="http://arxiv.org/find/cs/1/au:+Sabour_M/0/1/0/all/0/1">Mohammad Hossein Sabour</a>Nature-Inspired Computational Model of Population Desegregation under Group Leaders Influence. (arXiv:1905.09795v1 [cs.MA])http://arxiv.org/abs/1905.09795
<p>This paper presents an agent-based model of population desegregation and
provides a thorough analysis of the social behavior leading to it, namely, the
contact hypothesis. Based on the parameters of frequency and intensity of
influence of group leaders on the population, the proposed model is constituted
by two layers: 1) a physical layer of the population that is influenced by and
2) a virtual layer of group leaders. The model of negotiation and survival of
group leaders are governed by the nature-inspired evolutionary process of queen
ants, also known as Foundress Dilemma. The motivation of using a virtual
grouping concept (instead of taking a subset of population as the group
leaders) is to stay focused on finding the conditions leading individuals in a
society tolerating a significantly diversified (desegregated) neighborhood,
rather than, indulging into complex details, which would be more relevant to
studies targeting the evolution of societal group and leaders. A geographic
information system-driven simulation is performed, which reveals that: 1)
desegregation is directly proportional to the frequency of group leaders'
contact with the population and 2) mostly, it remains ineffective with an
increase in the intensity of group leaders' contact with the population. The
mechanism of group selection (the conflict resolution model resolving the
Foundress Dilemma) reveals an exciting result concerning negative influence of
cooperative group leaders. Most of the time, desegregation decreases with
increase in cooperative leaders (the leaders enforcing desegregation) when
compared with fierce leaders (the leaders enforcing segregation).
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zia_K/0/1/0/all/0/1">Kashif Zia</a>, <a href="http://arxiv.org/find/cs/1/au:+Saini_D/0/1/0/all/0/1">Dinesh Kumar Saini</a>, <a href="http://arxiv.org/find/cs/1/au:+Muhammad_A/0/1/0/all/0/1">Arshad Muhammad</a>, <a href="http://arxiv.org/find/cs/1/au:+Ferscha_A/0/1/0/all/0/1">Alois Ferscha</a>Augmenting correlation structures in spatial data using deep generative models. (arXiv:1905.09796v1 [cs.LG])http://arxiv.org/abs/1905.09796
<p>State-of-the-art deep learning methods have shown a remarkable capacity to
model complex data domains, but struggle with geospatial data. In this paper,
we introduce SpaceGAN, a novel generative model for geospatial domains that
learns neighbourhood structures through spatial conditioning. We propose to
enhance spatial representation beyond mere spatial coordinates, by conditioning
each data point on feature vectors of its spatial neighbours, thus allowing for
a more flexible representation of the spatial structure. To overcome issues of
training convergence, we employ a metric capturing the loss in local spatial
autocorrelation between real and generated data as stopping criterion for
SpaceGAN parametrization. This way, we ensure that the generator produces
synthetic samples faithful to the spatial patterns observed in the input.
SpaceGAN is successfully applied for data augmentation and outperforms compared
to other methods of synthetic spatial data generation. Finally, we propose an
ensemble learning framework for the geospatial domain, taking augmented
SpaceGAN samples as training data for a set of ensemble learners. We
empirically show the superiority of this approach over conventional ensemble
learning approaches and rivaling spatial data augmentation methods, using
synthetic and real-world prediction tasks. Our findings suggest that SpaceGAN
can be used as a tool for (1) artificially inflating sparse geospatial data and
(2) improving generalization of geospatial models.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Klemmer_K/0/1/0/all/0/1">Konstantin Klemmer</a>, <a href="http://arxiv.org/find/cs/1/au:+Koshiyama_A/0/1/0/all/0/1">Adriano Koshiyama</a>, <a href="http://arxiv.org/find/cs/1/au:+Flennerhag_S/0/1/0/all/0/1">Sebastian Flennerhag</a>Interpreting Adversarially Trained Convolutional Neural Networks. (arXiv:1905.09797v1 [cs.LG])http://arxiv.org/abs/1905.09797
<p>We attempt to interpret how adversarially trained convolutional neural
networks (AT-CNNs) recognize objects. We design systematic approaches to
interpret AT-CNNs in both qualitative and quantitative ways and compare them
with normally trained models. Surprisingly, we find that adversarial training
alleviates the texture bias of standard CNNs when trained on object recognition
tasks, and helps CNNs learn a more shape-biased representation. We validate our
hypothesis from two aspects. First, we compare the salience maps of AT-CNNs and
standard CNNs on clean images and images under different transformations. The
comparison could visually show that the prediction of the two types of CNNs is
sensitive to dramatically different types of features. Second, to achieve
quantitative verification, we construct additional test datasets that destroy
either textures or shapes, such as style-transferred version of clean data,
saturated images and patch-shuffled ones, and then evaluate the classification
accuracy of AT-CNNs and normal CNNs on these datasets. Our findings shed some
light on why AT-CNNs are more robust than those normally trained ones and
contribute to a better understanding of adversarial training over CNNs from an
interpretation perspective.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_T/0/1/0/all/0/1">Tianyuan Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhu_Z/0/1/0/all/0/1">Zhanxing Zhu</a>How degenerate is the parametrization of neural networks with the ReLU activation function?. (arXiv:1905.09803v1 [cs.LG])http://arxiv.org/abs/1905.09803
<p>Neural network training is usually accomplished by solving a non-convex
optimization problem using stochastic gradient descent. Although one optimizes
over the networks parameters, the loss function generally only depends on the
realization of a neural network, i.e. the function it computes. Studying the
functional optimization problem over the space of realizations can open up
completely new ways to understand neural network training. In particular, usual
loss functions like the mean squared error are convex on sets of neural network
realizations, which themselves are non-convex. Note, however, that each
realization has many different, possibly degenerate, parametrizations. In
particular, a local minimum in the parametrization space needs not correspond
to a local minimum in the realization space. To establish such a connection,
inverse stability of the realization map is required, meaning that proximity of
realizations must imply proximity of corresponding parametrizations. In this
paper we present pathologies which prevent inverse stability in general, and
proceed to establish a restricted set of parametrizations on which we have
inverse stability w.r.t. to a Sobolev norm. Furthermore, we show that by
optimizing over such restricted sets, it is still possible to learn any
function, which can be learned by optimization over unrestricted sets. While
most of this paper focuses on shallow networks, none of methods used are, in
principle, limited to shallow networks, and it should be possible to extend
them to deep neural networks.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Berner_J/0/1/0/all/0/1">Julius Berner</a>, <a href="http://arxiv.org/find/cs/1/au:+Elbrachter_D/0/1/0/all/0/1">Dennis Elbr&#xe4;chter</a>, <a href="http://arxiv.org/find/cs/1/au:+Grohs_P/0/1/0/all/0/1">Philipp Grohs</a>MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies. (arXiv:1905.09808v1 [cs.LG])http://arxiv.org/abs/1905.09808
<p>Humans are able to perform a myriad of sophisticated tasks by drawing upon
skills acquired through prior experience. For autonomous agents to have this
capability, they must be able to extract reusable skills from past experience
that can be recombined in new ways for subsequent tasks. Furthermore, when
controlling complex high-dimensional morphologies, such as humanoid bodies,
tasks often require coordination of multiple skills simultaneously. Learning
discrete primitives for every combination of skills quickly becomes
prohibitive. Composable primitives that can be recombined to create a large
variety of behaviors can be more suitable for modeling this combinatorial
explosion. In this work, we propose multiplicative compositional policies
(MCP), a method for learning reusable motor skills that can be composed to
produce a range of complex behaviors. Our method factorizes an agent's skills
into a collection of primitives, where multiple primitives can be activated
simultaneously via multiplicative composition. This flexibility allows the
primitives to be transferred and recombined to elicit new behaviors as
necessary for novel tasks. We demonstrate that MCP is able to extract
composable skills for highly complex simulated characters from pre-training
tasks, such as motion imitation, and then reuse these skills to solve
challenging continuous control tasks, such as dribbling a soccer ball to a
goal, and picking up an object and transporting it to a target location.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Peng_X/0/1/0/all/0/1">Xue Bin Peng</a>, <a href="http://arxiv.org/find/cs/1/au:+Chang_M/0/1/0/all/0/1">Michael Chang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_G/0/1/0/all/0/1">Grace Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Abbeel_P/0/1/0/all/0/1">Pieter Abbeel</a>, <a href="http://arxiv.org/find/cs/1/au:+Levine_S/0/1/0/all/0/1">Sergey Levine</a>Efficient Reduction in Shape Parameter Space Dimension for Ship Propeller Blade Design. (arXiv:1905.09815v1 [cs.CE])http://arxiv.org/abs/1905.09815
<p>In this work, we present the results of a ship propeller design optimization
campaign carried out in the framework of the research project PRELICA, funded
by the Friuli Venezia Giulia regional government. The main idea of this work is
to operate on a multidisciplinary level to identify propeller shapes that lead
to reduced tip vortex-induced pressure and increased efficiency without
altering the thrust. First, a specific tool for the bottom-up construction of
parameterized propeller blade geometries has been developed. The algorithm
proposed operates with a user defined number of arbitrary shaped or NACA
airfoil sections, and employs arbitrary degree NURBS to represent the chord,
pitch, skew and rake distribution as a function of the blade radial coordinate.
The control points of such curves have been modified to generate, in a fully
automated way, a family of blade geometries depending on as many as 20 shape
parameters. Such geometries have then been used to carry out potential flow
simulations with the Boundary Element Method based software PROCAL. Given the
high number of parameters considered, such a preliminary stage allowed for a
fast evaluation of the performance of several hundreds of shapes. In addition,
the data obtained from the potential flow simulation allowed for the
application of a parameter space reduction methodology based on active
subspaces (AS) property, which suggested that the main propeller performance
indices are, at a first but rather accurate approximation, only depending on a
single parameter which is a linear combination of all the original geometric
ones. AS analysis has also been used to carry out a constrained optimization
exploiting response surface method in the reduced parameter space, and a
sensitivity analysis based on such surrogate model. The few selected shapes
were finally used to set up high fidelity RANS simulations and select an
optimal shape.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Mola_A/0/1/0/all/0/1">Andrea Mola</a>, <a href="http://arxiv.org/find/cs/1/au:+Tezzele_M/0/1/0/all/0/1">Marco Tezzele</a>, <a href="http://arxiv.org/find/cs/1/au:+Gadalla_M/0/1/0/all/0/1">Mahmoud Gadalla</a>, <a href="http://arxiv.org/find/cs/1/au:+Valdenazzi_F/0/1/0/all/0/1">Federica Valdenazzi</a>, <a href="http://arxiv.org/find/cs/1/au:+Grassi_D/0/1/0/all/0/1">Davide Grassi</a>, <a href="http://arxiv.org/find/cs/1/au:+Padovan_R/0/1/0/all/0/1">Roberta Padovan</a>, <a href="http://arxiv.org/find/cs/1/au:+Rozza_G/0/1/0/all/0/1">Gianluigi Rozza</a>Coinduction: an elementary approach. (arXiv:1501.04354v8 [cs.LO] UPDATED)http://arxiv.org/abs/1501.04354
<p>The main aim of this paper is to promote a certain style of doing coinductive
proofs, similar to inductive proofs as commonly done by mathematicians. For
this purpose, we provide a reasonably direct justification for coinductive
proofs written in this style, i.e., converting a coinductive proof into a
non-coinductive argument is purely a matter of routine. In this way, we provide
an elementary explanation of how to interpret coinduction in set theory.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Czajka_L/0/1/0/all/0/1">&#x141;ukasz Czajka</a>New Lower Bounds for van der Waerden Numbers Using Distributed Computing. (arXiv:1603.03301v6 [math.CO] UPDATED)http://arxiv.org/abs/1603.03301
<p>This paper provides new lower bounds for van der Waerden numbers. The number
$W(k,r)$ is defined to be the smallest integer $n$ for which any $r$-coloring
of the integers $0 \ldots, n-1$ admits monochromatic arithmetic progression of
length $k$; its existence is implied by van der Waerden's Theorem. We exhibit
$r$-colorings of $0\ldots n-1$ that do not contain monochromatic arithmetic
progressions of length $k$ to prove that $W(k, r)&gt;n$. These colorings are
constructed using existing techniques. Rabung's method, given a prime $p$ and a
primitive root $\rho$, applies a color given by the discrete logarithm base
$\rho$ mod $r$ and concatenates $k-1$ copies. We also used Herwig et al's
Cyclic Zipper Method, which doubles or quadruples the length of a coloring,
with the faster check of Rabung and Lotts. We were able to check larger primes
than previous results, employing around 2 teraflops of computing power for 12
months through distributed computing by over 500 volunteers. This allowed us to
check all primes through 950 million, compared to 10 million by Rabung and
Lotts. Our lower bounds appear to grow roughly exponentially in $k$. Given that
these constructions produce tight lower bounds for known van der Waerden
numbers, this data suggests that exact van der Waerden Numbers grow
exponentially in $k$ with ratio $r$ asymptotically, which is a new conjecture,
according to Graham.
</p>
<a href="http://arxiv.org/find/math/1/au:+Monroe_D/0/1/0/all/0/1">Daniel Monroe</a>Search-and-Fetch with 2 Robots on a Disk: Wireless and Face-to-Face Communication Models. (arXiv:1611.10208v4 [cs.DS] UPDATED)http://arxiv.org/abs/1611.10208
<p>We initiate the study of a new problem on searching and fetching in a
distributed environment concerning treasure-evacuation from a unit disk. A
treasure and an exit are located at unknown positions on the perimeter of a
disk and at known arc distance. A team of two robots start from the center of
the disk, and their goal is to fetch the treasure to the exit. At any time the
robots can move anywhere they choose on the disk, independently of each other,
with the same speed. A robot detects an interesting point (treasure or exit)
only if it passes over the exact location of that point. We are interested in
designing distributed algorithms that minimize the worst-case
treasure-evacuation time, i.e. the time it takes for the treasure to be
discovered and brought (fetched) to the exit by any of the robots.
</p>
<p>The communication protocol between the robots is either wireless, where
information is shared at any time, or face-to-face (i.e. non-wireless), where
information can be shared only if the robots meet. For both models we obtain
upper bounds for fetching the treasure to the exit. Our main technical
contribution pertains to the face-to-face model. More specifically, we
demonstrate how robots can exchange information without meeting, effectively
achieving a highly efficient treasure-evacuation protocol which is minimally
affected by the lack of distant communication. Finally, we complement our
positive results above by providing a lower bound in the face-to-face model.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Georgiou_K/0/1/0/all/0/1">Konstantinos Georgiou</a>, <a href="http://arxiv.org/find/cs/1/au:+Karakostas_G/0/1/0/all/0/1">George Karakostas</a>, <a href="http://arxiv.org/find/cs/1/au:+Kranakis_E/0/1/0/all/0/1">Evangelos Kranakis</a>An $\omega$-Algebra for Real-Time Energy Problems. (arXiv:1701.08524v4 [cs.LO] UPDATED)http://arxiv.org/abs/1701.08524
<p>We develop a $^*$-continuous Kleene $\omega$-algebra of real-time energy
functions. Together with corresponding automata, these can be used to model
systems which can consume and regain energy (or other types of resources)
depending on available time. Using recent results on $^*$-continuous Kleene
$\omega$-algebras and computability of certain manipulations on real-time
energy functions, it follows that reachability and B\"uchi acceptance in
real-time energy automata can be decided in a static way which only involves
manipulations of real-time energy functions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cachera_D/0/1/0/all/0/1">David Cachera</a>, <a href="http://arxiv.org/find/cs/1/au:+Fahrenberg_U/0/1/0/all/0/1">Uli Fahrenberg</a>, <a href="http://arxiv.org/find/cs/1/au:+Legay_A/0/1/0/all/0/1">Axel Legay</a>Quantifiers on languages and codensity monads. (arXiv:1702.08841v3 [cs.LO] UPDATED)http://arxiv.org/abs/1702.08841
<p>This paper contributes to the techniques of topo-algebraic recognition for
languages beyond the regular setting as they relate to logic on words. In
particular, we provide a general construction on recognisers corresponding to
adding one layer of various kinds of quantifiers and prove a corresponding
Reutenauer-type theorem. Our main tools are codensity monads and duality
theory. Our construction hinges on a measure-theoretic characterisation of the
profinite monad of the free S-semimodule monad for finite and commutative
semirings S, which generalises our earlier insight that the Vietoris monad on
Boolean spaces is the codensity monad of the finite powerset functor.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Gehrke_M/0/1/0/all/0/1">Mai Gehrke</a>, <a href="http://arxiv.org/find/cs/1/au:+Petrisan_D/0/1/0/all/0/1">Daniela Petrisan</a>, <a href="http://arxiv.org/find/cs/1/au:+Reggio_L/0/1/0/all/0/1">Luca Reggio</a>Generalized Degrees Freedom of Noncoherent MIMO with Asymmetric Links. (arXiv:1705.07355v4 [cs.IT] UPDATED)http://arxiv.org/abs/1705.07355
<p>We study the generalized degrees of freedom (gDoF) of the block-fading
noncoherent multiple input multiple output (MIMO) channel with asymmetric
distributions of link strengths, and a coherence time of T symbol durations. We
derive the optimal signaling structure for communication for asymmetric MIMO,
which is distinct from that for the MIMO with identically independent
distributed (i.i.d.) links. We extend the existing gDoF results for single
input multiple output (SIMO) with i.i.d. links to the asymmetric case, proving
that selecting the statistically best antenna is gDoF-optimal. Using the gDoF
result for SIMO, we prove that for T=1, the gDoF is zero for MIMO channels with
arbitrary link strength distributions, extending the result for MIMO with
i.i.d. links. We show that selecting the statistically best antenna is
gDoF-optimal for multiple input single output (MISO) channel. We also derive
the gDoF for the 2x2 MIMO channel with different exponents in the direct and
cross links. In this setting, we show that it is always necessary to use both
the antennas to achieve the optimal gDoF, in contrast to the results for 2x2
MIMO with i.i.d. links. We show that having weaker crosslinks, gives gDoF gain
compared to the case with i.i.d. links. For noncoherent MIMO with i.i.d. links,
the traditional method of training each transmit antenna independently is DoF
optimal, whereas we observe that for the asymmetric 2x2 MIMO, the traditional
training is not gDoF-optimal. We extend this observation to a larger MxM MIMO
by demonstrating a strategy that can achieve larger gDoF than a traditional
training-based method.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Sebastian_J/0/1/0/all/0/1">Joyson Sebastian</a>, <a href="http://arxiv.org/find/cs/1/au:+Diggavi_S/0/1/0/all/0/1">Suhas. N. Diggavi</a>Optimal Secure Multi-Layer IoT Network Design. (arXiv:1707.07046v3 [cs.GT] UPDATED)http://arxiv.org/abs/1707.07046
<p>With the remarkable growth of the Internet and communication technologies
over the past few decades, Internet of Things (IoTs) is enabling the ubiquitous
connectivity of heterogeneous physical devices with software, sensors, and
actuators. IoT networks are naturally two-layer with the cloud and cellular
networks coexisting with the underlaid device-to-device (D2D) communications.
The connectivity of IoTs plays an important role in information dissemination
for mission-critical and civilian applications. However, IoT communication
networks are vulnerable to cyber attacks including the denial-of-service (DoS)
and jamming attacks, resulting in link removals in IoT network. In this work,
we develop a heterogeneous IoT network design framework in which a network
designer can add links to provide additional communication paths between two
nodes or secure links against attacks by investing resources. By anticipating
the strategic cyber attacks, we characterize the optimal design of secure IoT
network by first providing a lower bound on the number of links a secure
network requires for a given budget of protected links, and then developing a
method to construct networks that satisfy the heterogeneous network design
specifications. Therefore, each layer of the designed heterogeneous IoT network
is resistant to a predefined level of malicious attacks with minimum resources.
Finally, we provide case studies on the Internet of Battlefield Things (IoBT)
to corroborate and illustrate our obtained results.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chen_J/0/1/0/all/0/1">Juntao Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Touati_C/0/1/0/all/0/1">Corinne Touati</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhu_Q/0/1/0/all/0/1">Quanyan Zhu</a>Learning With Errors and Extrapolated Dihedral Cosets. (arXiv:1710.08223v2 [cs.CR] UPDATED)http://arxiv.org/abs/1710.08223
<p>The hardness of the learning with errors (LWE) problem is one of the most
fruitful resources of modern cryptography. In particular, it is one of the most
prominent candidates for secure post-quantum cryptography. Understanding its
quantum complexity is therefore an important goal. We show that under quantum
polynomial time reductions, LWE is equivalent to a relaxed version of the
dihedral coset problem (DCP), which we call extrapolated DCP (eDCP). The extent
of extrapolation varies with the LWE noise rate. By considering different
extents of extrapolation, our result generalizes Regev's famous proof that if
DCP is in BQP (quantum poly-time) then so is LWE (FOCS'02). We also discuss a
connection between eDCP and Childs and Van Dam's algorithm for generalized
hidden shift problems (SODA'07). Our result implies that a BQP solution for LWE
might not require the full power of solving DCP, but rather only a solution for
its relaxed version, eDCP, which could be easier.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Brakerski_Z/0/1/0/all/0/1">Zvika Brakerski</a>, <a href="http://arxiv.org/find/cs/1/au:+Kirshanova_E/0/1/0/all/0/1">Elena Kirshanova</a>, <a href="http://arxiv.org/find/cs/1/au:+Stehle_D/0/1/0/all/0/1">Damien Stehl&#xe9;</a>, <a href="http://arxiv.org/find/cs/1/au:+Wen_W/0/1/0/all/0/1">Weiqiang Wen</a>KRISM --- Krylov Subspace-based Optical Computing of Hyperspectral Images. (arXiv:1801.09343v3 [eess.IV] UPDATED)http://arxiv.org/abs/1801.09343
<p>We present an adaptive imaging technique that optically computes a low-rank
representation of a scene's hyperspectral image. The proposed imager, KRISM,
provides optical implementation of two operators on the scene's hyperspectral
image: a spectrally-coded spatial measurement and a spatially-coded spectral
measurement. By iterating between the two operators, using the output of one as
the input to the other, we show that the top singular vectors and singular
values of a hyperspectral image can be computed in the optical domain with only
a few measurements. We present an optical design that uses pupil plane coding
for implementing the two operations and show several compelling results using a
lab prototype to demonstrate the effectiveness of the proposed hyperspectral
imager.
</p>
<a href="http://arxiv.org/find/eess/1/au:+Saragadam_V/0/1/0/all/0/1">Vishwanath Saragadam</a>, <a href="http://arxiv.org/find/eess/1/au:+Sankaranarayanan_A/0/1/0/all/0/1">Aswin C. Sankaranarayanan</a>Writability and reachability for alpha-tape infinite time Turing machines. (arXiv:1802.05734v4 [math.LO] UPDATED)http://arxiv.org/abs/1802.05734
<p>Infinite time Turing machine models with tape length $\alpha$ (denoted
$T_\alpha$) strengthen the $\omega$-tape machines of Hamkins and Kidder from
[HL00] and led to some new phenomena that were studied in [Rin14]. For
instance, for some countable ordinals $\alpha$ there are cells that cannot be
halting positions of $T_\alpha$ given trivial input (i.e. no computation halts
with its head in this cell). We provide various characterizations of the least
such ordinal $\delta$, thereby answering the main open question in [Rin14].
Notably, the following properties of an ordinal $\alpha$ happen for the first
time for $\alpha=\delta$. (i) For some $\xi&lt;\alpha$, there is a
$T_\xi$-writable but not $T_\alpha$-writable subset of $\omega$. (ii) There is
a gap in the $T_\alpha$-writable ordinals. (iii) $\alpha$ is uncountable in
$L_{\lambda_\alpha}$, where $\lambda_\alpha$ denotes the supremum of ordinals
with a $T_\alpha$-writable code of length $\alpha$. We further show that
$\delta$ is a closure point of the function $\alpha \mapsto \Sigma_\alpha$,
where $\Sigma_\alpha$ denotes the supremum of the ordinals with a
$T_\alpha$-accidentally writable code of length $\alpha$. The proof of this
result relies on the above characterizations and an analogue to Welch's
submodel characterization of the ordinals $\lambda$, $\zeta$ and $\Sigma$.
</p>
<a href="http://arxiv.org/find/math/1/au:+Carl_M/0/1/0/all/0/1">Merlin Carl</a>, <a href="http://arxiv.org/find/math/1/au:+Rin_B/0/1/0/all/0/1">Benjamin Rin</a>, <a href="http://arxiv.org/find/math/1/au:+Schlicht_P/0/1/0/all/0/1">Philipp Schlicht</a>Learning Decorrelated Hashing Codes for Multimodal Retrieval. (arXiv:1803.00682v2 [cs.IR] UPDATED)http://arxiv.org/abs/1803.00682
<p>In social networks, heterogeneous multimedia data correlate to each other,
such as videos and their corresponding tags in YouTube and image-text pairs in
Facebook. Nearest neighbor retrieval across multiple modalities on large data
sets becomes a hot yet challenging problem. Hashing is expected to be an
efficient solution, since it represents data as binary codes. As the bit-wise
XOR operations can be fast handled, the retrieval time is greatly reduced. Few
existing multimodal hashing methods consider the correlation among hashing
bits. The correlation has negative impact on hashing codes. When the hashing
code length becomes longer, the retrieval performance improvement becomes
slower. In this paper, we propose a minimum correlation regularization (MCR)
for multimodal hashing. First, the sigmoid function is used to embed the data
matrices. Then, the MCR is applied on the output of sigmoid function. As the
output of sigmoid function approximates a binary code matrix, the proposed MCR
can efficiently decorrelate the hashing codes. Experiments show the superiority
of the proposed method becomes greater as the code length increases.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Tian_D/0/1/0/all/0/1">Dayong Tian</a>OIL: Observational Imitation Learning. (arXiv:1803.01129v3 [cs.CV] UPDATED)http://arxiv.org/abs/1803.01129
<p>Recent work has explored the problem of autonomous navigation by imitating a
teacher and learning an end-to-end policy, which directly predicts controls
from raw images. However, these approaches tend to be sensitive to mistakes by
the teacher and do not scale well to other environments or vehicles. To this
end, we propose Observational Imitation Learning (OIL), a novel imitation
learning variant that supports online training and automatic selection of
optimal behavior by observing multiple imperfect teachers. We apply our
proposed methodology to the challenging problems of autonomous driving and UAV
racing. For both tasks, we utilize the Sim4CV simulator that enables the
generation of large amounts of synthetic training data and also allows for
online learning and evaluation. We train a perception network to predict
waypoints from raw image data and use OIL to train another network to predict
controls from these waypoints. Extensive experiments demonstrate that our
trained network outperforms its teachers, conventional imitation learning (IL)
and reinforcement learning (RL) baselines and even humans in simulation. The
project website is available at https://sites.google.com/kaust.edu.sa/oil/ and
a video at https://youtu.be/_rhq8a0qgeg
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_G/0/1/0/all/0/1">Guohao Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Muller_M/0/1/0/all/0/1">Matthias M&#xfc;ller</a>, <a href="http://arxiv.org/find/cs/1/au:+Casser_V/0/1/0/all/0/1">Vincent Casser</a>, <a href="http://arxiv.org/find/cs/1/au:+Smith_N/0/1/0/all/0/1">Neil Smith</a>, <a href="http://arxiv.org/find/cs/1/au:+Michels_D/0/1/0/all/0/1">Dominik L. Michels</a>, <a href="http://arxiv.org/find/cs/1/au:+Ghanem_B/0/1/0/all/0/1">Bernard Ghanem</a>Distributed Simulation and Distributed Inference. (arXiv:1804.06952v3 [cs.DS] UPDATED)http://arxiv.org/abs/1804.06952
<p>Independent samples from an unknown probability distribution $\bf p$ on a
domain of size $k$ are distributed across $n$ players, with each player holding
one sample. Each player can communicate $\ell$ bits to a central referee in a
simultaneous message passing model of communication to help the referee infer a
property of the unknown $\bf p$. What is the least number of players for
inference required in the communication-starved setting of $\ell&lt;\log k$? We
begin by exploring a general "simulate-and-infer" strategy for such inference
problems where the center simulates the desired number of samples from the
unknown distribution and applies standard inference algorithms for the
collocated setting. Our first result shows that for $\ell&lt;\log k$ perfect
simulation of even a single sample is not possible. Nonetheless, we present a
Las Vegas algorithm that simulates a single sample from the unknown
distribution using $O(k/2^\ell)$ samples in expectation. As an immediate
corollary, we get that simulate-and-infer attains the optimal sample complexity
of $\Theta(k^2/2^\ell\epsilon^2)$ for learning the unknown distribution to
total variation distance $\epsilon$. For the prototypical testing problem of
identity testing, simulate-and-infer works with $O(k^{3/2}/2^\ell\epsilon^2)$
samples, a requirement that seems to be inherent for all communication
protocols not using any additional resources. Interestingly, we can break this
barrier using public coins. Specifically, we exhibit a public-coin
communication protocol that performs identity testing using
$O(k/\sqrt{2^\ell}\epsilon^2)$ samples. Furthermore, we show that this is
optimal up to constant factors. Our theoretically sample-optimal protocol is
easy to implement in practice. Our proof of lower bound entails showing a
contraction in $\chi^2$ distance of product distributions due to communication
constraints and may be of independent interest.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Acharya_J/0/1/0/all/0/1">Jayadev Acharya</a>, <a href="http://arxiv.org/find/cs/1/au:+Canonne_C/0/1/0/all/0/1">Cl&#xe9;ment L. Canonne</a>, <a href="http://arxiv.org/find/cs/1/au:+Tyagi_H/0/1/0/all/0/1">Himanshu Tyagi</a>Bisimulations for Delimited-Control Operators. (arXiv:1804.08373v4 [cs.LO] UPDATED)http://arxiv.org/abs/1804.08373
<p>We present a comprehensive study of the behavioral theory of an untyped
$\lambda$-calculus extended with the delimited-control operators shift and
reset. To that end, we define a contextual equivalence for this calculus, that
we then aim to characterize with coinductively defined relations, called
bisimilarities. We consider different styles of bisimilarities (namely
applicative, normal-form, and environmental) within a unifying framework, and
we give several examples to illustrate their respective strengths and
weaknesses. We also discuss how to extend this work to other delimited-control
operators.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Biernacki_D/0/1/0/all/0/1">Dariusz Biernacki</a>, <a href="http://arxiv.org/find/cs/1/au:+Lenglet_S/0/1/0/all/0/1">Sergue&#xef; Lenglet</a>, <a href="http://arxiv.org/find/cs/1/au:+Polesiuk_P/0/1/0/all/0/1">Piotr Polesiuk</a>Old and New Nearly Optimal Polynomial Root-finders. (arXiv:1805.12042v6 [cs.NA] UPDATED)http://arxiv.org/abs/1805.12042
<p>Univariate polynomial root-finding has been studied for four millennia and
still remains the subject of intensive research. Hundreds if not thousands of
efficient algorithms for this task have been proposed and analyzed. Two nearly
optimal solution algorithms have been devised in 1995 and 2016, based on
recursive factorization of a polynomial and subdivision iterations,
respectively, but both of them are superseded in practice by Ehrlich's
functional iterations. By combining factorization techniques with Ehrlich's and
subdivision iterations we devise a variety of new root-finders. They match or
supersede the known algorithms in terms of their estimated complexity for
root-finding on the complex plain, in a disc, and in a line segment and promise
to be practically competitive.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pan_V/0/1/0/all/0/1">Victor Y. Pan</a>COUNTDOWN: a Run-time Library for Performance-Neutral Energy Saving in MPI Applications. (arXiv:1806.07258v2 [cs.DC] UPDATED)http://arxiv.org/abs/1806.07258
<p>Power and energy consumption is becoming key challenges to deploy the first
exascale supercomputer successfully. Large-scale HPC applications waste a
significant amount of power in communication and synchronization-related idle
times. However, due to the time scale at which communication happens,
transitioning in low power states during communication's idle times may
introduce unacceptable overhead in applications' execution time. In this paper,
we present COUNTDOWN, a runtime library, supported by a methodology and
analysis tool for identifying and automatically reducing the power consumption
of the computing elements during communication and synchronization. COUNTDOWN
saves energy without imposing significant time-to-completion increase by
lowering CPUs power consumption only during idle times for which power state
transition overhead are negligible. This is done transparently to the user,
without requiring labor-intensive and error-prone application code
modifications, nor requiring recompilation of the application. We test our
methodology in a production Tier-0 system. For the NAS benchmarks, COUNTDOWN
saves between 6% and 50% energy, with a time-to-solution penalty lower than 5%.
In a complete production --- Quantum ESPRESSO --- for a 3.5K cores run,
COUNTDOWN saves 22.36% energy, with a performance penalty below 3%. Energy
saving increases to 37% with a performance penalty of 6.38%, if the application
is executed without communication tuning.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cesarini_D/0/1/0/all/0/1">Daniele Cesarini</a>, <a href="http://arxiv.org/find/cs/1/au:+Bartolini_A/0/1/0/all/0/1">Andrea Bartolini</a>, <a href="http://arxiv.org/find/cs/1/au:+Bonfa_P/0/1/0/all/0/1">Pietro Bonf&#xe0;</a>, <a href="http://arxiv.org/find/cs/1/au:+Cavazzoni_C/0/1/0/all/0/1">Carlo Cavazzoni</a>, <a href="http://arxiv.org/find/cs/1/au:+Benini_L/0/1/0/all/0/1">Luca Benini</a>Rearrangement and Prekopa-Leindler type inequalities. (arXiv:1806.08837v2 [math.PR] UPDATED)http://arxiv.org/abs/1806.08837
<p>We investigate the interactions of functional rearrangements with
Prekopa-Leindler type inequalities. It is shown that that a general class of
integral inequalities tighten on rearrangement to "isoperimetric" sets with
respect to a relevant measure. Applications to the Borell-Brascamp-Lieb,
Borell-Ehrhart, and the recent polar Prekopa-Leindler inequalities are
demonstrated. It is also proven that an integrated form of the Gaussian
log-Sobolev inequality decreases on half-space rearrangement.
</p>
<a href="http://arxiv.org/find/math/1/au:+Melbourne_J/0/1/0/all/0/1">James Melbourne</a>Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations. (arXiv:1807.00251v3 [math.NA] UPDATED)http://arxiv.org/abs/1807.00251
<p>Machine learning (ML) problems are often posed as highly nonlinear and
nonconvex unconstrained optimization problems. Methods for solving ML problems
based on stochastic gradient descent are easily scaled for very large problems
but may involve fine-tuning many hyper-parameters. Quasi-Newton approaches
based on the limited-memory Broyden-Fletcher-Goldfarb-Shanno (BFGS) update
typically do not require manually tuning hyper-parameters but suffer from
approximating a potentially indefinite Hessian with a positive-definite matrix.
Hessian-free methods leverage the ability to perform Hessian-vector
multiplication without needing the entire Hessian matrix, but each iteration's
complexity is significantly greater than quasi-Newton methods. In this paper we
propose an alternative approach for solving ML problems based on a quasi-Newton
trust-region framework for solving large-scale optimization problems that allow
for indefinite Hessian approximations. Numerical experiments on a standard
testing data set show that with a fixed computational time budget, the proposed
methods achieve better results than the traditional limited-memory BFGS and the
Hessian-free methods.
</p>
<a href="http://arxiv.org/find/math/1/au:+Erway_J/0/1/0/all/0/1">Jennifer B. Erway</a>, <a href="http://arxiv.org/find/math/1/au:+Griffin_J/0/1/0/all/0/1">Joshua Griffin</a>, <a href="http://arxiv.org/find/math/1/au:+Marcia_R/0/1/0/all/0/1">Roummel F. Marcia</a>, <a href="http://arxiv.org/find/math/1/au:+Omheni_R/0/1/0/all/0/1">Riadh Omheni</a>On Algorithms for and Computing with the Tensor Ring Decomposition. (arXiv:1807.02513v2 [math.NA] UPDATED)http://arxiv.org/abs/1807.02513
<p>Tensor decompositions such as the canonical format and the tensor train
format have been widely utilized to reduce storage costs and operational
complexities for high-dimensional data, achieving linear scaling with the input
dimension instead of exponential scaling. In this paper, we investigate even
lower storage-cost representations in the tensor ring format, which is an
extension of the tensor train format with variable end-ranks. Firstly, we
introduce two algorithms for converting a tensor in full format to tensor ring
format with low storage cost. Secondly, we detail a rounding operation for
tensor rings and show how this requires new definitions of common linear
algebra operations in the format to obtain storage-cost savings. Lastly, we
introduce algorithms for transforming the graph structure of graph-based tensor
formats, with orders of magnitude lower complexity than existing literature.
The efficiency of all algorithms is demonstrated on a number of numerical
examples, and in certain cases, we demonstrate significantly higher compression
ratios when compared to previous approaches to using the tensor ring format.
</p>
<a href="http://arxiv.org/find/math/1/au:+Mickelin_O/0/1/0/all/0/1">Oscar Mickelin</a>, <a href="http://arxiv.org/find/math/1/au:+Karaman_S/0/1/0/all/0/1">Sertac Karaman</a>Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods. (arXiv:1807.07540v3 [stat.ML] UPDATED)http://arxiv.org/abs/1807.07540
<p>Neural network optimization methods fall into two broad classes: adaptive
methods such as Adam and non-adaptive methods such as vanilla stochastic
gradient descent (SGD). Here, we formulate the problem of neural network
optimization as Bayesian filtering. We find that state-of-the-art adaptive
(AdamW) and non-adaptive (SGD) methods can be recovered by taking limits as the
amount of information about the parameter gets large or small, respectively. As
such, we develop a new neural network optimization algorithm, AdaBayes, that
adaptively transitions between SGD-like and Adam(W)-like behaviour. This
algorithm converges more rapidly than Adam in the early part of learning, and
has generalisation performance competitive with SGD.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Aitchison_L/0/1/0/all/0/1">Laurence Aitchison</a>Cutting Down Training Memory by Re-fowarding. (arXiv:1808.00079v3 [cs.LG] UPDATED)http://arxiv.org/abs/1808.00079
<p>Deep Neural Networks(DNNs) require huge GPU memory when training on modern
image/video databases. Unfortunately, the GPU memory in off-the-shelf devices
is always finite, which limits the image resolutions and batch sizes that could
be used for better DNN performance. In this paper, we propose a novel training
approach, called Re-forwarding, that substantially reduces memory usage in
training. Our approach automatically finds a subset of vertices in a DNN
computation graph, and stores tensors only at these vertices during the first
forward. During backward, extra local forwards (called the Re-forwarding
operations) are conducted to compute the missing tensors. The total training
memory cost becomes the sum of (1) the memory cost of the subset of vertices
and (2) the maximum memory cost of local forwards. Re-forwarding trades time
overheads for memory costs and does not compromise any performance in testing.
We present theories and algorithms that achieve optimal memory solutions on
DNNs with both linear and arbitrary computation graphs. Experiments show that
Re-forwarding cuts down up-to 80% of training memory with a moderate time
overhead (around 40%) on popular DNNs such as Alexnet, VGG, ResNet, Densenet
and Inception net.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Feng_J/0/1/0/all/0/1">Jianwei Feng</a>, <a href="http://arxiv.org/find/cs/1/au:+Huang_D/0/1/0/all/0/1">Dong Huang</a>Bisplit graphs satisfy the Chen-Chv\'atal conjecture. (arXiv:1808.08710v4 [cs.DM] UPDATED)http://arxiv.org/abs/1808.08710
<p>In this paper, we give a lengthy proof of a small result! A graph is bisplit
if its vertex set can be partitioned into three stable sets with two of them
inducing a complete bipartite graph. We prove that these graphs satisfy the
Chen-Chv\'atal conjecture: their metric space (in the usual sense) has a
universal line (in an unusual sense) or at least as many lines as the number of
vertices.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Beaudou_L/0/1/0/all/0/1">Laurent Beaudou</a>, <a href="http://arxiv.org/find/cs/1/au:+Kahn_G/0/1/0/all/0/1">Giacomo Kahn</a>, <a href="http://arxiv.org/find/cs/1/au:+Rosenfeld_M/0/1/0/all/0/1">Matthieu Rosenfeld</a>Certified Adversarial Robustness with Additive Gaussian Noise. (arXiv:1809.03113v2 [cs.LG] UPDATED)http://arxiv.org/abs/1809.03113
<p>The existence of adversarial data examples has drawn significant attention in
the deep-learning community; such data are seemingly minimally perturbed
relative to the original data, but lead to very different outputs from a
deep-learning algorithm. Although a significant body of work on developing
defense models has been developed, most such models are heuristic and are often
vulnerable to adaptive attacks. Defensive methods that provide theoretical
robustness guarantees have been studied intensively, yet most fail to obtain
non-trivial robustness when a large-scale model and data are present. To
address these limitations, we introduce a framework that is scalable and
provides certified bounds on the norm of the input manipulation for
constructing adversarial examples. We establish a connection between robustness
against adversarial perturbation and additive random noise, and propose a
training strategy that can significantly improve the certified bounds. Our
evaluation on MNIST, CIFAR-10 and ImageNet suggests that our method is scalable
to complicated models and large data sets, while providing competitive
robustness to state-of-the-art provable defense methods.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_B/0/1/0/all/0/1">Bai Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Chen_C/0/1/0/all/0/1">Changyou Chen</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_W/0/1/0/all/0/1">Wenlin Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Carin_L/0/1/0/all/0/1">Lawrence Carin</a>Retrieval-Enhanced Adversarial Training for Neural Response Generation. (arXiv:1809.04276v2 [cs.CL] UPDATED)http://arxiv.org/abs/1809.04276
<p>Dialogue systems are usually built on either generation-based or
retrieval-based approaches, yet they do not benefit from the advantages of
different models. In this paper, we propose a Retrieval-Enhanced Adversarial
Training (REAT) method for neural response generation. Distinct from existing
approaches, the REAT method leverages an encoder-decoder framework in terms of
an adversarial training paradigm, while taking advantage of N-best response
candidates from a retrieval-based system to construct the discriminator. An
empirical study on a large scale public available benchmark dataset shows that
the REAT method significantly outperforms the vanilla Seq2Seq model as well as
the conventional adversarial training approach.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhu_Q/0/1/0/all/0/1">Qingfu Zhu</a>, <a href="http://arxiv.org/find/cs/1/au:+Cui_L/0/1/0/all/0/1">Lei Cui</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_W/0/1/0/all/0/1">Weinan Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Wei_F/0/1/0/all/0/1">Furu Wei</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_T/0/1/0/all/0/1">Ting Liu</a>Rethinking Location Privacy for Unknown Mobility Behaviors. (arXiv:1809.04415v2 [cs.CR] UPDATED)http://arxiv.org/abs/1809.04415
<p>Location Privacy-Preserving Mechanisms (LPPMs) in the literature largely
consider that users' data available for training wholly characterizes their
mobility patterns. Thus, they hardwire this information in their designs and
evaluate their privacy properties with these same data. In this paper, we aim
to understand the impact of this decision on the level of privacy these LPPMs
may offer in real life when the users' mobility data may be different from the
data used in the design phase. Our results show that, in many cases, training
data does not capture users' behavior accurately and, thus, the level of
privacy provided by the LPPM is often overestimated. To address this gap
between theory and practice, we propose to use blank-slate models for LPPM
design. Contrary to the hardwired approach, that assumes known users' behavior,
blank-slate models learn the users' behavior from the queries to the service
provider. We leverage this blank-slate approach to develop a new family of
LPPMs, that we call Profile Estimation-Based LPPMs. Using real data, we
empirically show that our proposal outperforms optimal state-of-the-art
mechanisms designed on sporadic hardwired models. On non-sporadic location
privacy scenarios, our method is only better if the usage of the location
privacy service is not continuous. It is our hope that eliminating the need to
bootstrap the mechanisms with training data and ensuring that the mechanisms
are lightweight and easy to compute help fostering the integration of location
privacy protections in deployed systems.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Oya_S/0/1/0/all/0/1">Simon Oya</a>, <a href="http://arxiv.org/find/cs/1/au:+Troncoso_C/0/1/0/all/0/1">Carmela Troncoso</a>, <a href="http://arxiv.org/find/cs/1/au:+Perez_Gonzalez_F/0/1/0/all/0/1">Fernando P&#xe9;rez-Gonz&#xe1;lez</a>BrainNet: A Multi-Person Brain-to-Brain Interface for Direct Collaboration Between Brains. (arXiv:1809.08632v3 [cs.HC] UPDATED)http://arxiv.org/abs/1809.08632
<p>We present BrainNet which, to our knowledge, is the first multi-person
non-invasive direct brain-to-brain interface for collaborative problem solving.
The interface combines electroencephalography (EEG) to record brain signals and
transcranial magnetic stimulation (TMS) to deliver information noninvasively to
the brain. The interface allows three human subjects to collaborate and solve a
task using direct brain-to-brain communication. Two of the three subjects are
"Senders" whose brain signals are decoded using real-time EEG data analysis to
extract decisions about whether to rotate a block in a Tetris-like game before
it is dropped to fill a line. The Senders' decisions are transmitted via the
Internet to the brain of a third subject, the "Receiver," who cannot see the
game screen. The decisions are delivered to the Receiver's brain via magnetic
stimulation of the occipital cortex. The Receiver integrates the information
received and makes a decision using an EEG interface about either turning the
block or keeping it in the same position. A second round of the game gives the
Senders one more chance to validate and provide feedback to the Receiver's
action. We evaluated the performance of BrainNet in terms of (1) Group-level
performance during the game; (2) True/False positive rates of subjects'
decisions; (3) Mutual information between subjects. Five groups of three
subjects successfully used BrainNet to perform the Tetris task, with an average
accuracy of 0.813. Furthermore, by varying the information reliability of the
Senders by artificially injecting noise into one Sender's signal, we found that
Receivers are able to learn which Sender is more reliable based solely on the
information transmitted to their brains. Our results raise the possibility of
future brain-to-brain interfaces that enable cooperative problem solving by
humans using a "social network" of connected brains.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Jiang_L/0/1/0/all/0/1">Linxing Preston Jiang</a>, <a href="http://arxiv.org/find/cs/1/au:+Stocco_A/0/1/0/all/0/1">Andrea Stocco</a>, <a href="http://arxiv.org/find/cs/1/au:+Losey_D/0/1/0/all/0/1">Darby M. Losey</a>, <a href="http://arxiv.org/find/cs/1/au:+Abernethy_J/0/1/0/all/0/1">Justin A. Abernethy</a>, <a href="http://arxiv.org/find/cs/1/au:+Prat_C/0/1/0/all/0/1">Chantel S. Prat</a>, <a href="http://arxiv.org/find/cs/1/au:+Rao_R/0/1/0/all/0/1">Rajesh P. N. Rao</a>Probabilistic assessment of the impact of flexible loads under network tariffs in low voltage distribution networks. (arXiv:1810.02013v2 [cs.SY] UPDATED)http://arxiv.org/abs/1810.02013
<p>In many jurisdictions, the recent wave of rooftop PV investment compromised
the equity of network cost allocations across retail electricity customer
classes, in part due to poorly designed network, retail and feed-in tariffs.
Currently, a new wave of investment in distributed energy resource (DER), such
as residential batteries and home energy management systems, is unfurling and
with it so does the risk of repeating the same tariff design mistakes. As such,
distribution network service providers need improved tools for crafting
DER-specific tariffs. These tools will guide the design of tariffs that
minimize DER impacts on network performance, stabilize network company revenue,
and improve the equity of network costs across the customer base. Within this
context, this paper proposes a probabilistic framework to assess the impacts of
different network tariffs on the consumption pattern of electricity consumers
with flexible DER, such as thermostatically controlled loads and battery
storage. The assessment tool comprises randomly-generated synthetic load and PV
generation traces, which are fed into a mixed integer linear programming-based
home energy management system to schedule residential customers' controllable
devices connected to a low voltage network. Customer net loads are then used in
low voltage power flow studies to assess the network effects of various tariff
designs. In this work, assessments are made of energy- and demand-based
tariffs. Simulation results show that flat tariffs with a peak demand component
perform best in terms of electricity cost reduction for the customer, as well
as in mitigating the level of binding network constraints. This demonstrates
how the assessment tool can be used by distribution network service providers
and regulators to develop tariffs that are beneficial for networks that play
host to growing numbers of PV-battery systems and other DER.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Azuatalam_D/0/1/0/all/0/1">Donald Azuatalam</a>, <a href="http://arxiv.org/find/cs/1/au:+Chapman_A/0/1/0/all/0/1">Archie C. Chapman</a>, <a href="http://arxiv.org/find/cs/1/au:+Verbic_G/0/1/0/all/0/1">Gregor Verbi&#x10d;</a>Long ties accelerate noisy threshold-based contagions. (arXiv:1810.03579v3 [cs.SI] UPDATED)http://arxiv.org/abs/1810.03579
<p>Network structure can affect when and how widely new ideas, products, and
behaviors are adopted. In widely-used models of biological contagion,
interventions that randomly rewire edges (generally making them "longer")
accelerate spread. However, there are other models relevant to social
contagion, such as those motivated by myopic best-response in games with
strategic complements, in which an individual's behavior is described by a
threshold number of adopting neighbors above which adoption occurs (i.e.,
complex contagions). Recent work has argued that highly clustered, rather than
random, networks facilitate spread of these complex contagions. Here we show
that minor modifications to this model, which make it more realistic, reverse
this result: we allow very rare below-threshold adoption, i.e., rarely adoption
occurs when there is only one adopting neighbor. To model the trade-off between
long and short edges we consider networks that are the union of cycle-power-$k$
graphs and random graphs on $n$ nodes. Allowing adoptions below threshold to
occur with order $1/\sqrt{n}$ probability along some "short" cycle edges is
enough to ensure that random rewiring accelerates spread. Simulations
illustrate the robustness of these results to other commonly-posited models for
noisy best-response behavior. Hypothetical interventions that randomly rewire
existing edges or add random edges (versus adding "short", triad-closing edges)
in hundreds of empirical social networks reduce time to spread. This revised
conclusion suggests that those wanting to increase spread should induce
formation of long ties, rather than triad-closing ties. More generally, this
highlights the importance of noise in game-theoretic analyses of behavior.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Eckles_D/0/1/0/all/0/1">Dean Eckles</a>, <a href="http://arxiv.org/find/cs/1/au:+Mossel_E/0/1/0/all/0/1">Elchanan Mossel</a>, <a href="http://arxiv.org/find/cs/1/au:+Rahimian_M/0/1/0/all/0/1">M. Amin Rahimian</a>, <a href="http://arxiv.org/find/cs/1/au:+Sen_S/0/1/0/all/0/1">Subhabrata Sen</a>MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization. (arXiv:1810.05739v4 [cs.CL] UPDATED)http://arxiv.org/abs/1810.05739
<p>Abstractive summarization has been studied using neural sequence transduction
methods with datasets of large, paired document-summary examples. However, such
datasets are rare and the models trained from them do not generalize to other
domains. Recently, some progress has been made in learning sequence-to-sequence
mappings with only unpaired examples. In our work, we consider the setting
where there are only documents (product or business reviews) with no summaries
provided, and propose an end-to-end, neural model architecture to perform
unsupervised abstractive summarization. Our proposed model consists of an
auto-encoder where the mean of the representations of the input reviews decodes
to a reasonable summary-review while not relying on any review-specific
features. We consider variants of the proposed architecture and perform an
ablation study to show the importance of specific components. We show through
automated metrics and human evaluation that the generated summaries are highly
abstractive, fluent, relevant, and representative of the average sentiment of
the input reviews. Finally, we collect a reference evaluation dataset and show
that our model outperforms a strong extractive baseline.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Chu_E/0/1/0/all/0/1">Eric Chu</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_P/0/1/0/all/0/1">Peter J. Liu</a>Computation Scheduling for Distributed Machine Learning with Straggling Workers. (arXiv:1810.09992v3 [cs.DC] UPDATED)http://arxiv.org/abs/1810.09992
<p>We study scheduling of computation tasks across n workers in a large scale
distributed learning problem with the help of a master. Computation and
communication delays are assumed to be random, and redundant computations are
assigned to workers in order to tolerate stragglers. We consider sequential
computation of tasks assigned to a worker, while the result of each computation
is sent to the master right after its completion. Each computation round, which
can model an iteration of the stochastic gradient descent (SGD) algorithm, is
completed once the master receives k distinct computations, referred to as the
computation target. Our goal is to characterize the average completion time as
a function of the computation load, which denotes the portion of the dataset
available at each worker, and the computation target. We propose two
computation scheduling schemes that specify the tasks assigned to each worker,
as well as their computation schedule, i.e., the order of execution. Assuming a
general statistical model for computation and communication delays, we derive
the average completion time of the proposed schemes. We also establish a lower
bound on the minimum average completion time by assuming prior knowledge of the
random delays. Experimental results carried out on Amazon EC2 cluster show a
significant reduction in the average completion time over existing coded and
uncoded computing schemes. It is also shown numerically that the gap between
the proposed scheme and the lower bound is relatively small, confirming the
efficiency of the proposed scheduling design.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Amiri_M/0/1/0/all/0/1">Mohammad Mohammadi Amiri</a>, <a href="http://arxiv.org/find/cs/1/au:+Gunduz_D/0/1/0/all/0/1">Deniz Gunduz</a>Cycle-consistency training for end-to-end speech recognition. (arXiv:1811.01690v2 [cs.CL] UPDATED)http://arxiv.org/abs/1811.01690
<p>This paper presents a method to train end-to-end automatic speech recognition
(ASR) models using unpaired data. Although the end-to-end approach can
eliminate the need for expert knowledge such as pronunciation dictionaries to
build ASR systems, it still requires a large amount of paired data, i.e.,
speech utterances and their transcriptions. Cycle-consistency losses have been
recently proposed as a way to mitigate the problem of limited paired data.
These approaches compose a reverse operation with a given transformation, e.g.,
text-to-speech (TTS) with ASR, to build a loss that only requires unsupervised
data, speech in this example. Applying cycle consistency to ASR models is not
trivial since fundamental information, such as speaker traits, are lost in the
intermediate text bottleneck. To solve this problem, this work presents a loss
that is based on the speech encoder state sequence instead of the raw speech
signal. This is achieved by training a Text-To-Encoder model and defining a
loss based on the encoder reconstruction error. Experimental results on the
LibriSpeech corpus show that the proposed cycle-consistency training reduced
the word error rate by 14.7% from an initial model trained with 100-hour paired
data, using an additional 360 hours of audio data without transcriptions. We
also investigate the use of text-only data mainly for language modeling to
further improve the performance in the unpaired data training scenario.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Hori_T/0/1/0/all/0/1">Takaaki Hori</a>, <a href="http://arxiv.org/find/cs/1/au:+Astudillo_R/0/1/0/all/0/1">Ramon Astudillo</a>, <a href="http://arxiv.org/find/cs/1/au:+Hayashi_T/0/1/0/all/0/1">Tomoki Hayashi</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Y/0/1/0/all/0/1">Yu Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Watanabe_S/0/1/0/all/0/1">Shinji Watanabe</a>, <a href="http://arxiv.org/find/cs/1/au:+Roux_J/0/1/0/all/0/1">Jonathan Le Roux</a>Symbolic Register Automata. (arXiv:1811.06968v2 [cs.FL] UPDATED)http://arxiv.org/abs/1811.06968
<p>Symbolic Finite Automata and Register Automata are two orthogonal extensions
of finite automata motivated by real-world problems where data may have
unbounded domains. These automata address a demand for a model over large or
infinite alphabets, respectively. Both automata models have interesting
applications and have been successful in their own right. In this paper, we
introduce Symbolic Register Automata, a new model that combines features from
both symbolic and register automata, with a view on applications that were
previously out of reach. We study their properties and provide algorithms for
emptiness, inclusion and equivalence checking, together with experimental
results.
</p>
<a href="http://arxiv.org/find/cs/1/au:+DAntoni_L/0/1/0/all/0/1">Loris D&#x27;Antoni</a>, <a href="http://arxiv.org/find/cs/1/au:+Ferreira_T/0/1/0/all/0/1">Tiago Ferreira</a>, <a href="http://arxiv.org/find/cs/1/au:+Sammartino_M/0/1/0/all/0/1">Matteo Sammartino</a>, <a href="http://arxiv.org/find/cs/1/au:+Silva_A/0/1/0/all/0/1">Alexandra Silva</a>A right-to-left type system for mutually-recursive value definitions. (arXiv:1811.08134v2 [cs.PL] UPDATED)http://arxiv.org/abs/1811.08134
<p>In call-by-value languages, some mutually-recursive value definitions can be
safely evaluated to build recursive functions or cyclic data structures, but
some definitions (let rec x = x + 1) contain vicious circles and their
evaluation fails at runtime. We propose a new static analysis to check the
absence of such runtime failures.
</p>
<p>We present a set of declarative inference rules, prove its soundness with
respect to the reference source-level semantics of Nordlander, Carlsson, and
Gill (2008), and show that it can be (right-to-left) directed into an
algorithmic check in a surprisingly simple way.
</p>
<p>Our implementation of this new check replaced the existing check used by the
OCaml programming language, a fragile syntactic/grammatical criterion which let
several subtle bugs slip through as the language kept evolving. We document
some issues that arise when advanced features of a real-world functional
language (exceptions in first-class modules, GADTs, etc.) interact with safety
checking for recursive definitions.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Reynaud_A/0/1/0/all/0/1">Alban Reynaud</a>, <a href="http://arxiv.org/find/cs/1/au:+Scherer_G/0/1/0/all/0/1">Gabriel Scherer</a>, <a href="http://arxiv.org/find/cs/1/au:+Yallop_J/0/1/0/all/0/1">Jeremy Yallop</a>InstaNAS: Instance-aware Neural Architecture Search. (arXiv:1811.10201v3 [cs.LG] UPDATED)http://arxiv.org/abs/1811.10201
<p>Conventional Neural Architecture Search (NAS) aims at finding a single
architecture that achieves the best performance, which usually optimizes task
related learning objectives such as accuracy. However, a single architecture
may not be representative enough for the whole dataset with high diversity and
variety. Intuitively, electing domain-expert architectures that are proficient
in domain-specific features can further benefit architecture related objectives
such as latency. In this paper, we propose InstaNAS---an instance-aware NAS
framework---that employs a controller trained to search for a "distribution of
architectures" instead of a single architecture; This allows the model to use
sophisticated architectures for the difficult samples, which usually comes with
large architecture related cost, and shallow architectures for those easy
samples. During the inference phase, the controller assigns each of the unseen
input samples with a domain expert architecture that can achieve high accuracy
with customized inference costs. Experiments within a search space inspired by
MobileNetV2 show InstaNAS can achieve up to 48.8% latency reduction without
compromising accuracy on a series of datasets against MobileNetV2.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cheng_A/0/1/0/all/0/1">An-Chieh Cheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Lin_C/0/1/0/all/0/1">Chieh Hubert Lin</a>, <a href="http://arxiv.org/find/cs/1/au:+Juan_D/0/1/0/all/0/1">Da-Cheng Juan</a>, <a href="http://arxiv.org/find/cs/1/au:+Wei_W/0/1/0/all/0/1">Wei Wei</a>, <a href="http://arxiv.org/find/cs/1/au:+Sun_M/0/1/0/all/0/1">Min Sun</a>Restricted Boltzmann Machine with Multivalued Hidden Variables: a model suppressing over-fitting. (arXiv:1811.12587v2 [stat.ML] UPDATED)http://arxiv.org/abs/1811.12587
<p>Generalization is one of the most important issues in machine learning
problems. In this study, we consider generalization in restricted Boltzmann
machines (RBMs). We propose an RBM with multivalued hidden variables, which is
a simple extension of conventional RBMs. We demonstrate that the proposed model
is better than the conventional model via numerical experiments for contrastive
divergence learning with artificial data and a classification problem with
MNIST.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Yokoyama_Y/0/1/0/all/0/1">Yuuki Yokoyama</a>, <a href="http://arxiv.org/find/stat/1/au:+Katsumata_T/0/1/0/all/0/1">Tomu Katsumata</a>, <a href="http://arxiv.org/find/stat/1/au:+Yasuda_M/0/1/0/all/0/1">Muneki Yasuda</a>Channel Shortening by Large Multiantenna Precoding in OFDM. (arXiv:1812.01947v2 [cs.IT] UPDATED)http://arxiv.org/abs/1812.01947
<p>A channel delay spread larger than the cyclic prefix (CP) creates
self-interference (ISI/ICI) in orthogonal frequency-division multiplexing
(OFDM). Recent interests in low-latency application has motivated usage of
shorter OFDM symbols. In turn, one can either downscale the CP at the cost of
interference, or maintain the CP but with increased overhead. To simultaneously
maintain low overhead and interference, this paper studies channel shortening
methods exploiting the properties of large multi-antenna precoding in OFDM. It
is shown that ISI/ICI can be asymptotically canceled out by subcarrier-level
precoding with infinite number of antennas. The method, coined time-frequency
(TF) precoding, is based on introducing time-delay selectivity inside
conventional frequency-selective precoders in order to remove undesired delayed
signals. This leads to an optimization trade-off in the precoder between
interference mitigation and multi-path combining gain. Time-reversal (TR)
filtering, where an OFDM signal without CP is filtered according to the
multi-antenna channels, is considered as a benchmark since it provides
asymptotically the optimal rate, having no CP overhead, and both full
interference-cancellation and maximum muli-path combining gain. Meanwhile,
finite-size analysis shows that TF-precoding converges faster to its asymptotic
rate than TR-filtering, so that TF-precoding can outperform TR-filtering in the
high-SNR regime with not-so-large number of antennas.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pitaval_R/0/1/0/all/0/1">Renaud-Alexandre Pitaval</a>Arbitrary Style Transfer with Style-Attentional Networks. (arXiv:1812.02342v5 [cs.CV] UPDATED)http://arxiv.org/abs/1812.02342
<p>Arbitrary style transfer aims to synthesize a content image with the style of
an image to create a third image that has never been seen before. Recent
arbitrary style transfer algorithms find it challenging to balance the content
structure and the style patterns. Moreover, simultaneously maintaining the
global and local style patterns is difficult due to the patch-based mechanism.
In this paper, we introduce a novel style-attentional network (SANet) that
efficiently and flexibly integrates the local style patterns according to the
semantic spatial distribution of the content image. A new identity loss
function and multi-level feature embeddings enable our SANet and decoder to
preserve the content structure as much as possible while enriching the style
patterns. Experimental results demonstrate that our algorithm synthesizes
stylized images in real-time that are higher in quality than those produced by
the state-of-the-art algorithms.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Park_D/0/1/0/all/0/1">Dae Young Park</a>, <a href="http://arxiv.org/find/cs/1/au:+Lee_K/0/1/0/all/0/1">Kwang Hee Lee</a>Design of a Networked Controller for a Two-Wheeled Inverted Pendulum Robot. (arXiv:1812.03071v2 [cs.SY] UPDATED)http://arxiv.org/abs/1812.03071
<p>The topic of this paper is to use an intuitive model-based approach to design
a networked controller for a recent benchmark scenario. The benchmark problem
is to remotely control a two-wheeled inverted pendulum robot via W-LAN
communication. The robot has to keep a vertical upright position. Incorporating
wireless communication in the control loop introduces multiple uncertainties
and affects system performance and stability. The proposed networked control
scheme employs model predictive techniques and deliberately extends delays in
order to make them constant and deterministic. The performance of the resulting
networked control system is evaluated experimentally with a predefined
benchmarking experiment and is compared to local control involving no delays.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Music_Z/0/1/0/all/0/1">Zenit Music</a>, <a href="http://arxiv.org/find/cs/1/au:+Molinari_F/0/1/0/all/0/1">Fabio Molinari</a>, <a href="http://arxiv.org/find/cs/1/au:+Gallenmuller_S/0/1/0/all/0/1">Sebastian Gallenm&#xfc;ller</a>, <a href="http://arxiv.org/find/cs/1/au:+Ayan_O/0/1/0/all/0/1">Onur Ayan</a>, <a href="http://arxiv.org/find/cs/1/au:+Zoppi_S/0/1/0/all/0/1">Samuele Zoppi</a>, <a href="http://arxiv.org/find/cs/1/au:+Kellerer_W/0/1/0/all/0/1">Wolfgang Kellerer</a>, <a href="http://arxiv.org/find/cs/1/au:+Carle_G/0/1/0/all/0/1">Georg Carle</a>, <a href="http://arxiv.org/find/cs/1/au:+Seel_T/0/1/0/all/0/1">Thomas Seel</a>, <a href="http://arxiv.org/find/cs/1/au:+Raisch_J/0/1/0/all/0/1">J&#xf6;rg Raisch</a>Fast Approximate Geodesics for Deep Generative Models. (arXiv:1812.08284v2 [stat.ML] UPDATED)http://arxiv.org/abs/1812.08284
<p>The length of the geodesic between two data points along a Riemannian
manifold, induced by a deep generative model, yields a principled measure of
similarity. Current approaches are limited to low-dimensional latent spaces,
due to the computational complexity of solving a non-convex optimisation
problem. We propose finding shortest paths in a finite graph of samples from
the aggregate approximate posterior, that can be solved exactly, at greatly
reduced runtime, and without a notable loss in quality. Our approach,
therefore, is hence applicable to high-dimensional problems, e.g., in the
visual domain. We validate our approach empirically on a series of experiments
using variational autoencoders applied to image data, including the Chair,
FashionMNIST, and human movement data sets.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Chen_N/0/1/0/all/0/1">Nutan Chen</a>, <a href="http://arxiv.org/find/stat/1/au:+Ferroni_F/0/1/0/all/0/1">Francesco Ferroni</a>, <a href="http://arxiv.org/find/stat/1/au:+Klushyn_A/0/1/0/all/0/1">Alexej Klushyn</a>, <a href="http://arxiv.org/find/stat/1/au:+Paraschos_A/0/1/0/all/0/1">Alexandros Paraschos</a>, <a href="http://arxiv.org/find/stat/1/au:+Bayer_J/0/1/0/all/0/1">Justin Bayer</a>, <a href="http://arxiv.org/find/stat/1/au:+Smagt_P/0/1/0/all/0/1">Patrick van der Smagt</a>Semi-parametric dynamic contextual pricing. (arXiv:1901.02045v2 [cs.LG] UPDATED)http://arxiv.org/abs/1901.02045
<p>Motivated by the application of real-time pricing in e-commerce platforms, we
consider the problem of revenue-maximization in a setting where the seller can
leverage contextual information describing the customer's history and the
product's type to predict her valuation of the product. However, her true
valuation is unobservable to the seller, only binary outcome in the form of
success-failure of a transaction is observed. Unlike in usual contextual bandit
settings, the optimal price/arm given a covariate in our setting is sensitive
to the detailed characteristics of the residual uncertainty distribution. We
develop a semi-parametric model in which the residual distribution is
non-parametric and provide the first algorithm which learns both regression
parameters and residual distribution with $\tilde O(\sqrt{n})$ regret. We
empirically test a scalable implementation of our algorithm and observe good
performance.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Shah_V/0/1/0/all/0/1">Virag Shah</a>, <a href="http://arxiv.org/find/cs/1/au:+Blanchet_J/0/1/0/all/0/1">Jose Blanchet</a>, <a href="http://arxiv.org/find/cs/1/au:+Johari_R/0/1/0/all/0/1">Ramesh Johari</a>Foothill: A Quasiconvex Regularization for Edge Computing of Deep Neural Networks. (arXiv:1901.06414v2 [stat.ML] UPDATED)http://arxiv.org/abs/1901.06414
<p>Deep neural networks (DNNs) have demonstrated success for many supervised
learning tasks, ranging from voice recognition, object detection, to image
classification. However, their increasing complexity might yield poor
generalization error that make them hard to be deployed on edge devices.
Quantization is an effective approach to compress DNNs in order to meet these
constraints. Using a quasiconvex base function in order to construct a binary
quantizer helps training binary neural networks (BNNs) and adding noise to the
input data or using a concrete regularization function helps to improve
generalization error. Here we introduce foothill function, an infinitely
differentiable quasiconvex function. This regularizer is flexible enough to
deform towards $L_1$ and $L_2$ penalties. Foothill can be used as a binary
quantizer, as a regularizer, or as a loss. In particular, we show this
regularizer reduces the accuracy gap between BNNs and their full-precision
counterpart for image classification on ImageNet.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Belbahri_M/0/1/0/all/0/1">Mouloud Belbahri</a>, <a href="http://arxiv.org/find/stat/1/au:+Sari_E/0/1/0/all/0/1">Eyy&#xfc;b Sari</a>, <a href="http://arxiv.org/find/stat/1/au:+Darabi_S/0/1/0/all/0/1">Sajad Darabi</a>, <a href="http://arxiv.org/find/stat/1/au:+Nia_V/0/1/0/all/0/1">Vahid Partovi Nia</a>Rank-consistent Ordinal Regression for Neural Networks. (arXiv:1901.07884v3 [cs.LG] UPDATED)http://arxiv.org/abs/1901.07884
<p>Extraordinary progress has been made towards developing neural network
architectures for classification tasks. However, commonly used loss functions
such as the multi-category cross entropy loss are inadequate for ranking and
ordinal regression problems. Hence, approaches that utilize neural networks for
ordinal regression tasks transform ordinal target variables series of binary
classification tasks but suffer from inconsistencies among the different binary
classifiers. Thus, we propose a new framework (Consistent Rank Logits, CORAL)
with theoretical guarantees for rank-monotonicity and consistent confidence
scores. Through parameter sharing, our framework also benefits from lower
training complexity and can easily be implemented to extend conventional
convolutional neural network classifiers for ordinal regression tasks.
Furthermore, the empirical evaluation of our method on a range of face image
datasets for age prediction shows a substantial improvement compared to the
current state-of-the-art ordinal regression method.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cao_W/0/1/0/all/0/1">Wenzhi Cao</a>, <a href="http://arxiv.org/find/cs/1/au:+Mirjalili_V/0/1/0/all/0/1">Vahid Mirjalili</a>, <a href="http://arxiv.org/find/cs/1/au:+Raschka_S/0/1/0/all/0/1">Sebastian Raschka</a>Recurrent Neural Filters: Learning Independent Bayesian Filtering Steps for Time Series Prediction. (arXiv:1901.08096v2 [stat.ML] UPDATED)http://arxiv.org/abs/1901.08096
<p>Despite the recent popularity of deep generative state space models, few
comparisons have been made between network architectures and the inference
steps of the Bayesian filtering framework -- with most models simultaneously
approximating both state transition and update steps with a single recurrent
neural network (RNN). In this paper, we introduce the Recurrent Neural Filter
(RNF), a novel recurrent autoencoder architecture that learns distinct
representations for each Bayesian filtering step, captured by a series of
encoders and decoders. Testing this on three real-world time series datasets,
we demonstrate that the decoupled representations learnt not only improve the
accuracy of one-step-ahead forecasts while providing realistic uncertainty
estimates, but also facilitate multistep prediction through the separation of
encoder stages.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Lim_B/0/1/0/all/0/1">Bryan Lim</a>, <a href="http://arxiv.org/find/stat/1/au:+Zohren_S/0/1/0/all/0/1">Stefan Zohren</a>, <a href="http://arxiv.org/find/stat/1/au:+Roberts_S/0/1/0/all/0/1">Stephen Roberts</a>Learning Pairwise Interactions with Bayesian Neural Networks. (arXiv:1901.08361v2 [cs.LG] UPDATED)http://arxiv.org/abs/1901.08361
<p>Estimating pairwise interaction effects, i.e., the difference between the
joint effect and the sum of marginal effects of two input features, with
uncertainty properly quantified, is centrally important in science
applications. We propose a non-parametric probabilistic method for detecting
interaction effects of unknown form. First, the relationship between the
features and the output is modelled using a Bayesian neural network, leveraging
on the representation capability of deep neural networks. Second, interaction
effects and their uncertainty are estimated from the trained model. For the
second step we propose a simple and intuitive global interaction measure:
Expected Integrated Hessian (EIH), whose uncertainty can be estimated using the
predictive uncertainty. Two important properties of the Bayesian EIH are: 1.
interaction estimation error is upper bounded by the prediction error of the
neural network, which ensures interaction detection can be improved by training
a more accurate model; 2. uncertainty of the Bayesian EIH is well-calibrated
provided the prediction uncertainty is calibrated, which is easier to test and
guarantee. The method outperforms the available alternatives on simulated and
real-world data, and we demonstrate its ability to detect interpretable
interactions also between higher-level features (at deeper layers of the neural
network).
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cui_T/0/1/0/all/0/1">Tianyu Cui</a>, <a href="http://arxiv.org/find/cs/1/au:+Marttinen_P/0/1/0/all/0/1">Pekka Marttinen</a>, <a href="http://arxiv.org/find/cs/1/au:+Kaski_S/0/1/0/all/0/1">Samuel Kaski</a>Self-Supervised Generalisation with Meta Auxiliary Learning. (arXiv:1901.08933v2 [cs.LG] UPDATED)http://arxiv.org/abs/1901.08933
<p>Learning with auxiliary tasks can improve the ability of a primary task to
generalise. However, this comes at the cost of manually labelling auxiliary
data. We propose a new method which automatically learns appropriate labels for
an auxiliary task, such that any supervised learning task can be improved
without requiring access to any further data. The approach is to train two
neural networks: a label-generation network to predict the auxiliary labels,
and a multi-task network to train the primary task alongside the auxiliary
task. The loss for the label-generation network incorporates the loss of the
multi-task network, and so this interaction between the two networks can be
seen as a form of meta learning with a double gradient. We show that our
proposed method, Meta AuXiliary Learning (MAXL), outperforms single-task
learning on 7 image datasets, without requiring any additional data. We also
show that MAXL outperforms several other baselines for generating auxiliary
labels, and is even competitive when compared with human-defined auxiliary
labels. The self-supervised nature of our method leads to a promising new
direction towards automated generalisation. Source code is available at
https://github.com/lorenmt/maxl.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Liu_S/0/1/0/all/0/1">Shikun Liu</a>, <a href="http://arxiv.org/find/cs/1/au:+Davison_A/0/1/0/all/0/1">Andrew J. Davison</a>, <a href="http://arxiv.org/find/cs/1/au:+Johns_E/0/1/0/all/0/1">Edward Johns</a>Improving Neural Network Quantization without Retraining using Outlier Channel Splitting. (arXiv:1901.09504v3 [cs.LG] UPDATED)http://arxiv.org/abs/1901.09504
<p>Quantization can improve the execution latency and energy efficiency of
neural networks on both commodity GPUs and specialized accelerators. The
majority of existing literature focuses on training quantized DNNs, while this
work examines the less-studied topic of quantizing a floating-point model
without (re)training. DNN weights and activations follow a bell-shaped
distribution post-training, while practical hardware uses a linear quantization
grid. This leads to challenges in dealing with outliers in the distribution.
Prior work has addressed this by clipping the outliers or using specialized
hardware. In this work, we propose outlier channel splitting (OCS), which
duplicates channels containing outliers, then halves the channel values. The
network remains functionally identical, but affected outliers are moved toward
the center of the distribution. OCS requires no additional training and works
on commodity hardware. Experimental evaluation on ImageNet classification and
language modeling shows that OCS can outperform state-of-the-art clipping
techniques with only minor overhead.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhao_R/0/1/0/all/0/1">Ritchie Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Hu_Y/0/1/0/all/0/1">Yuwei Hu</a>, <a href="http://arxiv.org/find/cs/1/au:+Dotzel_J/0/1/0/all/0/1">Jordan Dotzel</a>, <a href="http://arxiv.org/find/cs/1/au:+Sa_C/0/1/0/all/0/1">Christopher De Sa</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Z/0/1/0/all/0/1">Zhiru Zhang</a>Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent networks. (arXiv:1901.10113v5 [cs.LG] UPDATED)http://arxiv.org/abs/1901.10113
<p>Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown
distinct advantages, e.g., solving memory-dependent tasks and meta-learning.
However, little effort has been spent on improving RNN architectures and on
understanding the underlying neural mechanisms for performance gain. In this
paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical
results show that the network can autonomously learn to abstract sub-goals and
can self-develop an action hierarchy using internal dynamics in a challenging
continuous control task. Furthermore, we show that the self-developed
compositionality of the network enhances faster re-learning when adapting to a
new task that is a re-composition of previously learned sub-goals, than when
starting from scratch. We also found that improved performance can be achieved
when neural activities are subject to stochastic rather than deterministic
dynamics.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Han_D/0/1/0/all/0/1">Dongqi Han</a>, <a href="http://arxiv.org/find/cs/1/au:+Doya_K/0/1/0/all/0/1">Kenji Doya</a>, <a href="http://arxiv.org/find/cs/1/au:+Tani_J/0/1/0/all/0/1">Jun Tani</a>A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases. (arXiv:1902.00663v7 [cs.IR] UPDATED)http://arxiv.org/abs/1902.00663
<p>Deep language models learning a hierarchical representation proved to be a
powerful tool for natural language processing, text mining and information
retrieval. However, representations that perform well for retrieval must
capture semantic meaning at different levels of abstraction or context-scopes.
In this paper, we propose a new method to generate multi-resolution word
embeddings that represent documents at multiple resolutions in terms of
context-scopes. In order to investigate its performance,we use the Stanford
Question Answering Dataset (SQuAD) and the Question Answering by Search And
Reading (QUASAR) in an open-domain question-answering setting, where the first
task is to find documents useful for answering a given question. To this end,
we first compare the quality of various text-embedding methods for retrieval
performance and give an extensive empirical comparison with the performance of
various non-augmented base embeddings with and without multi-resolution
representation. We argue that multi-resolution word embeddings are consistently
superior to the original counterparts and deep residual neural models
specifically trained for retrieval purposes can yield further significant gains
when they are used for augmenting those embeddings.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cakaloglu_T/0/1/0/all/0/1">Tolgahan Cakaloglu</a>, <a href="http://arxiv.org/find/cs/1/au:+Xu_X/0/1/0/all/0/1">Xiaowei Xu</a>Scheduling with Predictions and the Price of Misprediction. (arXiv:1902.00732v2 [cs.DS] UPDATED)http://arxiv.org/abs/1902.00732
<p>In many traditional job scheduling settings, it is assumed that one knows the
time it will take for a job to complete service. In such cases, strategies such
as shortest job first can be used to improve performance in terms of measures
such as the average time a job waits in the system. We consider the setting
where the service time is not known, but is predicted by for example a machine
learning algorithm. Our main result is the derivation, under natural
assumptions, of formulae for the performance of several strategies for queueing
systems that use predictions for service times in order to schedule jobs. As
part of our analysis, we suggest the framework of the "price of misprediction,"
which offers a measure of the cost of using predicted information.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Mitzenmacher_M/0/1/0/all/0/1">Michael Mitzenmacher</a>Crop Yield Prediction Using Deep Neural Networks. (arXiv:1902.02860v2 [cs.LG] UPDATED)http://arxiv.org/abs/1902.02860
<p>Crop yield is a highly complex trait determined by multiple factors such as
genotype, environment, and their interactions. Accurate yield prediction
requires fundamental understanding of the functional relationship between yield
and these interactive factors, and to reveal such relationship requires both
comprehensive datasets and powerful algorithms. In the 2018 Syngenta Crop
Challenge, Syngenta released several large datasets that recorded the genotype
and yield performances of 2,267 maize hybrids planted in 2,247 locations
between 2008 and 2016 and asked participants to predict the yield performance
in 2017. As one of the winning teams, we designed a deep neural network (DNN)
approach that took advantage of state-of-the-art modeling and solution
techniques. Our model was found to have a superior prediction accuracy, with a
root-mean-square-error (RMSE) being 12% of the average yield and 50% of the
standard deviation for the validation dataset using predicted weather data.
With perfect weather data, the RMSE would be reduced to 11% of the average
yield and 46% of the standard deviation. We also performed feature selection
based on the trained DNN model, which successfully decreased the dimension of
the input space without significant drop in the prediction accuracy. Our
computational results suggested that this model significantly outperformed
other popular methods such as Lasso, shallow neural networks (SNN), and
regression tree (RT). The results also revealed that environmental factors had
a greater effect on the crop yield than genotype.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Khaki_S/0/1/0/all/0/1">Saeed Khaki</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_L/0/1/0/all/0/1">Lizhi Wang</a>Sampling networks by nodal attributes. (arXiv:1902.04707v2 [physics.soc-ph] UPDATED)http://arxiv.org/abs/1902.04707
<p>In a social network individuals or nodes connect to other nodes by choosing
one of the channels of communication at a time to re-establish the existing
social links. Since available data sets are usually restricted to a limited
number of channels or layers, these autonomous decision making processes by the
nodes constitute the sampling of a multiplex network leading to just one
(though very important) example of sampling bias caused by the behavior of the
nodes. We develop a general setting to get insight and understand the class of
network sampling models, where the probability of sampling a link in the
original network depends on the attributes $h$ of its adjacent nodes. Assuming
that the nodal attributes are independently drawn from an arbitrary
distribution $\rho(h)$ and that the sampling probability $r(h_i , h_j)$ for a
link $ij$ of nodal attributes $h_i$ and $h_j$ is also arbitrary, we derive
exact analytic expressions of the sampled network for such network
characteristics as the degree distribution, degree correlation, and clustering
spectrum. The properties of the sampled network turn out to be sums of
quantities for the original network topology weighted by the factors stemming
from the sampling. Based on our analysis, we find that the sampled network may
have sampling-induced network properties that are absent in the original
network, which implies the potential risk of a naive generalization of the
results of the sample to the entire original network. We also consider the
case, when neighboring nodes have correlated attributes to show how to
generalize our formalism for such sampling bias and we get good agreement
between the analytic results and the numerical simulations.
</p>
<a href="http://arxiv.org/find/physics/1/au:+Murase_Y/0/1/0/all/0/1">Yohsuke Murase</a>, <a href="http://arxiv.org/find/physics/1/au:+Jo_H/0/1/0/all/0/1">Hang-Hyun Jo</a>, <a href="http://arxiv.org/find/physics/1/au:+Torok_J/0/1/0/all/0/1">J&#xe1;nos T&#xf6;r&#xf6;k</a>, <a href="http://arxiv.org/find/physics/1/au:+Kertesz_J/0/1/0/all/0/1">J&#xe1;nos Kert&#xe9;sz</a>, <a href="http://arxiv.org/find/physics/1/au:+Kaski_K/0/1/0/all/0/1">Kimmo Kaski</a>Graph Dynamical Networks for Unsupervised Learning of Atomic Scale Dynamics in Materials. (arXiv:1902.06836v2 [cond-mat.mtrl-sci] UPDATED)http://arxiv.org/abs/1902.06836
<p>Understanding the dynamical processes that govern the performance of
functional materials is essential for the design of next generation materials
to tackle global energy and environmental challenges. Many of these processes
involve the dynamics of individual atoms or small molecules in condensed
phases, e.g. lithium ions in electrolytes, water molecules in membranes, molten
atoms at interfaces, etc., which are difficult to understand due to the
complexity of local environments. In this work, we develop graph dynamical
networks, an unsupervised learning approach for understanding atomic scale
dynamics in arbitrary phases and environments from molecular dynamics
simulations. We show that important dynamical information can be learned for
various multi-component amorphous material systems, which is difficult to
obtain otherwise. With the large amounts of molecular dynamics data generated
everyday in nearly every aspect of materials design, this approach provides a
broadly useful, automated tool to understand atomic scale dynamics in material
systems.
</p>
<a href="http://arxiv.org/find/cond-mat/1/au:+Xie_T/0/1/0/all/0/1">Tian Xie</a>, <a href="http://arxiv.org/find/cond-mat/1/au:+France_Lanord_A/0/1/0/all/0/1">Arthur France-Lanord</a>, <a href="http://arxiv.org/find/cond-mat/1/au:+Wang_Y/0/1/0/all/0/1">Yanming Wang</a>, <a href="http://arxiv.org/find/cond-mat/1/au:+Shao_Horn_Y/0/1/0/all/0/1">Yang Shao-Horn</a>, <a href="http://arxiv.org/find/cond-mat/1/au:+Grossman_J/0/1/0/all/0/1">Jeffrey C. Grossman</a>Uniform Substitution At One Fell Swoop. (arXiv:1902.07230v3 [cs.LO] UPDATED)http://arxiv.org/abs/1902.07230
<p>Uniform substitution of function, predicate, program or game symbols is the
core operation in parsimonious provers for hybrid systems and hybrid games. By
postponing soundness-critical admissibility checks, this paper introduces a
uniform substitution mechanism that proceeds in a linear pass homomorphically
along the formula. Soundness is recovered using a simple variable condition at
the replacements performed by the substitution. The setting in this paper is
that of differential hybrid games, in which discrete, continuous, and
adversarial dynamics interact in differential game logic dGL. This paper proves
soundness and completeness of one-pass uniform substitutions for dGL.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Platzer_A/0/1/0/all/0/1">Andr&#xe9; Platzer</a>Topology of Learning in Artificial Neural Networks. (arXiv:1902.08160v2 [cs.LG] UPDATED)http://arxiv.org/abs/1902.08160
<p>Understanding how neural networks learn remains one of the central challenges
in machine learning research. From random at the start of training, the weights
of a neural network evolve in such a way as to be able to perform a variety of
tasks, like classifying images. Here we study the emergence of structure in the
weights by applying methods from topological data analysis. We train simple
feedforward neural networks on the MNIST dataset and monitor the evolution of
the weights. When initialized to zero, the weights follow trajectories that
branch off recurrently, thus generating trees that describe the growth of the
effective capacity of each layer. When initialized to tiny random values, the
weights evolve smoothly along two-dimensional surfaces. We show that natural
coordinates on these learning surfaces correspond to important factors of
variation.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Gabella_M/0/1/0/all/0/1">Maxime Gabella</a>, <a href="http://arxiv.org/find/cs/1/au:+Afambo_N/0/1/0/all/0/1">Nitya Afambo</a>, <a href="http://arxiv.org/find/cs/1/au:+Ebli_S/0/1/0/all/0/1">Stefania Ebli</a>, <a href="http://arxiv.org/find/cs/1/au:+Spreemann_G/0/1/0/all/0/1">Gard Spreemann</a>A Degeneracy Framework for Scalable Graph Autoencoders. (arXiv:1902.08813v2 [cs.LG] UPDATED)http://arxiv.org/abs/1902.08813
<p>In this paper, we present a general framework to scale graph autoencoders
(AE) and graph variational autoencoders (VAE). This framework leverages graph
degeneracy concepts to train models only from a dense subset of nodes instead
of using the entire graph. Together with a simple yet effective propagation
mechanism, our approach significantly improves scalability and training speed
while preserving performance. We evaluate and discuss our method on several
variants of existing graph AE and VAE, providing the first application of these
models to large graphs with up to millions of nodes and edges. We achieve
empirically competitive results w.r.t. several popular scalable node embedding
methods, which emphasizes the relevance of pursuing further research towards
more scalable graph AE and VAE.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Salha_G/0/1/0/all/0/1">Guillaume Salha</a>, <a href="http://arxiv.org/find/cs/1/au:+Hennequin_R/0/1/0/all/0/1">Romain Hennequin</a>, <a href="http://arxiv.org/find/cs/1/au:+Tran_V/0/1/0/all/0/1">Viet Anh Tran</a>, <a href="http://arxiv.org/find/cs/1/au:+Vazirgiannis_M/0/1/0/all/0/1">Michalis Vazirgiannis</a>A Taxonomy of Modeling Approaches for Systems-of-Systems Dynamic Architectures: Overview and Prospects. (arXiv:1902.09090v4 [cs.SE] UPDATED)http://arxiv.org/abs/1902.09090
<p>Systems-of-Systems (SoS) result from the collaboration of independent
Constituent Systems (CSs) to achieve particular missions. CSs are not totally
known at design time, and may also leave or join SoS at runtime, which turns
the SoS architecture to be inherently dynamic, forming new architectural
configurations and impacting the overall system quality attributes (i.e.
performance, security and reliability). Therefore, it is vital to model and
evaluate the impact of these stochastic architectural changes on SoS properties
at abstract level at the early stage in order to analyze and select appropriate
architectural design. Architectural description languages (ADL) have been
proposed and used to deal with SoS dynamic architectures. However, we still
envision gaps to be bridged and challenges to be addressed in the forthcoming
years. This paper presents a broad discussion on the state-of-the-art notations
to model and analyze SoS dynamic architectures. The main contribution this
paper is threefold: (i) providing results of a literature review on the support
of available architecture modeling approaches for SoS and an analysis of their
semantic extension to support specification of SoS dynamic architectures, and
(ii) a corresponding taxonomy for modeling SoS obtained as a result of the
literature review. Besides, we also discuss future directions and challenges to
be overcome in the forthcoming years.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Mohsin_A/0/1/0/all/0/1">Ahmad Mohsin</a>, <a href="http://arxiv.org/find/cs/1/au:+Janjua_N/0/1/0/all/0/1">Naeem Khalid Janjua</a>, <a href="http://arxiv.org/find/cs/1/au:+Islam_S/0/1/0/all/0/1">Syed MS Islam</a>, <a href="http://arxiv.org/find/cs/1/au:+Neto_V/0/1/0/all/0/1">Valdemar Vicente Graciano Neto</a>Region Deformer Networks for Unsupervised Depth Estimation from Unconstrained Monocular Videos. (arXiv:1902.09907v2 [cs.CV] UPDATED)http://arxiv.org/abs/1902.09907
<p>While learning based depth estimation from images/videos has achieved
substantial progress, there still exist intrinsic limitations. Supervised
methods are limited by a small amount of ground truth or labeled data and
unsupervised methods for monocular videos are mostly based on the static scene
assumption, not performing well on real world scenarios with the presence of
dynamic objects. In this paper, we propose a new learning based method
consisting of DepthNet, PoseNet and Region Deformer Networks (RDN) to estimate
depth from unconstrained monocular videos without ground truth supervision. The
core contribution lies in RDN for proper handling of rigid and non-rigid
motions of various objects such as rigidly moving cars and deformable humans.
In particular, a deformation based motion representation is proposed to model
individual object motion on 2D images. This representation enables our method
to be applicable to diverse unconstrained monocular videos. Our method can not
only achieve the state-of-the-art results on standard benchmarks KITTI and
Cityscapes, but also show promising results on a crowded pedestrian tracking
dataset, which demonstrates the effectiveness of the deformation based motion
representation. Code and trained models are available at
https://github.com/haofeixu/rdn4depth.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Xu_H/0/1/0/all/0/1">Haofei Xu</a>, <a href="http://arxiv.org/find/cs/1/au:+Zheng_J/0/1/0/all/0/1">Jianmin Zheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Cai_J/0/1/0/all/0/1">Jianfei Cai</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_J/0/1/0/all/0/1">Juyong Zhang</a>Deep Learning How to Fit an Intravoxel Incoherent Motion Model to Diffusion-Weighted MRI. (arXiv:1903.00095v2 [q-bio.QM] UPDATED)http://arxiv.org/abs/1903.00095
<p>Purpose: This prospective clinical study assesses the feasibility of training
a deep neural network (DNN) for intravoxel incoherent motion (IVIM) model
fitting to diffusion-weighted magnetic resonance imaging (DW-MRI) data and
evaluates its performance. Methods: In May 2011, ten male volunteers (age
range: 29 to 53 years, mean: 37 years) underwent DW-MRI of the upper abdomen on
1.5T and 3.0T magnetic resonance scanners. Regions of interest in the left and
right liver lobe, pancreas, spleen, renal cortex, and renal medulla were
delineated independently by two readers. DNNs were trained for IVIM model
fitting using these data; results were compared to least-squares and Bayesian
approaches to IVIM fitting. Intraclass Correlation Coefficients (ICC) were used
to assess consistency of measurements between readers. Intersubject variability
was evaluated using Coefficients of Variation (CV). The fitting error was
calculated based on simulated data and the average fitting time of each method
was recorded. Results: DNNs were trained successfully for IVIM parameter
estimation. This approach was associated with high consistency between the two
readers (ICCs between 50 and 97%), low intersubject variability of estimated
parameter values (CVs between 9.2 and 28.4), and the lowest error when compared
with least-squares and Bayesian approaches. Fitting by DNNs was several orders
of magnitude quicker than the other methods but the networks may need to be
re-trained for different acquisition protocols or imaged anatomical regions.
Conclusion: DNNs are recommended for accurate and robust IVIM model fitting to
DW-MRI data. Suitable software is available at (1).
</p>
<a href="http://arxiv.org/find/q-bio/1/au:+Barbieri_S/0/1/0/all/0/1">Sebastiano Barbieri</a>, <a href="http://arxiv.org/find/q-bio/1/au:+Gurney_Champion_O/0/1/0/all/0/1">Oliver J. Gurney-Champion</a>, <a href="http://arxiv.org/find/q-bio/1/au:+Klaassen_R/0/1/0/all/0/1">Remy Klaassen</a>, <a href="http://arxiv.org/find/q-bio/1/au:+Thoeny_H/0/1/0/all/0/1">Harriet C. Thoeny</a>Automatic cough detection for portable spirometry system trained on large database. (arXiv:1903.03588v2 [q-bio.QM] UPDATED)http://arxiv.org/abs/1903.03588
<p>In this work, we give a short introduction on cough detection efforts that
were undertaken during the last decade and we describe the solution for
automatic cough detection developed for the AioCare portable spirometry system.
As the system is intended to be used in a large variety of environments and
different patients, we train the algorithm using the large database of
spirometry curves which is the NHANES database by the American National Center
for Health Statistics. We apply few data preprocessing steps and train
different classifiers such as logistic regression (LR), feed forward artificial
neural network (ANN), artificial neural network combined with principal
component analysis (PCA-ANN), support vector machine (SVM) and random forest
(RF) on this data to choose the one of the best performance. The accuracy,
sensitivity and specificity of the classifiers were comparable and equaled
within the range 91.1{\div}91.2%, 81.8{\div}83.8% and 95.0{\div}95.9% for the
test set, respectively. The ANN solution was selected as the final classifier.
Classification methodology developed in this study is robust for detecting
cough events during spirometry measurements. We also show that it is universal
and transferable between different systems as the performance on the NHANES and
AioCare test sets is similar. The solution presented in this work was
implemented in the AioCare mobile spirometry system.
</p>
<a href="http://arxiv.org/find/q-bio/1/au:+Solinski_M/0/1/0/all/0/1">Mateusz Soli&#x144;ski</a>, <a href="http://arxiv.org/find/q-bio/1/au:+Lepek_M/0/1/0/all/0/1">Micha&#x142; &#x141;epek</a>, <a href="http://arxiv.org/find/q-bio/1/au:+Koltowski_L/0/1/0/all/0/1">&#x141;ukasz Ko&#x142;towski</a>Deep learning for molecular design - a review of the state of the art. (arXiv:1903.04388v3 [cs.LG] UPDATED)http://arxiv.org/abs/1903.04388
<p>In the space of only a few years, deep generative modeling has revolutionized
how we think of artificial creativity, yielding autonomous systems which
produce original images, music, and text. Inspired by these successes,
researchers are now applying deep generative modeling techniques to the
generation and optimization of molecules - in our review we found 45 papers on
the subject published in the past two years. These works point to a future
where such systems will be used to generate lead molecules, greatly reducing
resources spent downstream synthesizing and characterizing bad leads in the
lab. In this review we survey the increasingly complex landscape of models and
representation schemes that have been proposed. The four classes of techniques
we describe are recursive neural networks, autoencoders, generative adversarial
networks, and reinforcement learning. After first discussing some of the
mathematical fundamentals of each technique, we draw high level connections and
comparisons with other techniques and expose the pros and cons of each. Several
important high level themes emerge as a result of this work, including the
shift away from the SMILES string representation of molecules towards more
sophisticated representations such as graph grammars and 3D representations,
the importance of reward function design, the need for better standards for
benchmarking and testing, and the benefits of adversarial training and
reinforcement learning over maximum likelihood based training.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Elton_D/0/1/0/all/0/1">Daniel C. Elton</a>, <a href="http://arxiv.org/find/cs/1/au:+Boukouvalas_Z/0/1/0/all/0/1">Zois Boukouvalas</a>, <a href="http://arxiv.org/find/cs/1/au:+Fuge_M/0/1/0/all/0/1">Mark D. Fuge</a>, <a href="http://arxiv.org/find/cs/1/au:+Chung_P/0/1/0/all/0/1">Peter W. Chung</a>Duration-of-Stay Storage Assignment under Uncertainty. (arXiv:1903.05063v2 [cs.LG] UPDATED)http://arxiv.org/abs/1903.05063
<p>Optimizing storage assignment is a central problem in warehousing. Past
literature has shown the superiority of the Duration-of-Stay (DoS) method in
assigning pallets, but the methodology requires perfect prior knowledge of DoS
for each pallet, which is unknown and uncertain under realistic conditions. The
dynamic nature of a warehouse further complicates the validity of synthetic
data testing that is often conducted for algorithms. In this paper, in
collaboration with a large cold storage company, we release the first publicly
available set of warehousing records to facilitate research into this central
problem. We introduce a new framework for storage assignment that accounts for
uncertainty in warehouses. Then, by utilizing a combination of convolutional
and recurrent neural network models, ParallelNet, we show that it is able to
predict future shipments well: it achieves up to 29% decrease in MAPE compared
to CNN-LSTM on unseen future shipments, and suffers less performance decay over
time. The framework is then integrated into a first-of-its-kind Storage
Assignment system, which is being piloted in warehouses across the country,
with initial results showing up to 19% in labor savings.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_M/0/1/0/all/0/1">Michael Lingzhi Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Wolf_E/0/1/0/all/0/1">Elliott Wolf</a>, <a href="http://arxiv.org/find/cs/1/au:+Wintz_D/0/1/0/all/0/1">Daniel Wintz</a>Semantic programming: method of $\Delta_0^p$-enrichments and polynomial analogue of the Gandy fixed point theorem. (arXiv:1903.08109v3 [cs.CC] UPDATED)http://arxiv.org/abs/1903.08109
<p>Computer programs fast entered in our life and the questions associated with
the execution of these programs have become the most relevant in our days.
Programs should work efficiently, i.e. work as quickly as possible and spend as
little resources as possible. Most often, such a "measure of efficiency" is the
polynomial program execution time of the length of the input data. Such
programs have great importance in the direction of smart contracts on
blockchain.
</p>
<p>In this article will be introduced the method of $\Delta_0^p$-enrichments
which will show how to switch from the usual polynomial model of
$\mathfrak{M}^{(0)}$ using $\Delta_0^p$-enrichments to a model with new
properties and new elements so that the new model will also be polynomial.
</p>
<p>$\Delta_0^p-$enrichments: $\mathfrak{M}^{(0)}\to ... \to \mathfrak{M}^{(i)}
\to ...\to\mathfrak{M}$
</p>
<p>This method based on theory of semantic programming entered in 1970s and
1980s, academics Ershov and Goncharov and professor Sviridenko.
</p>
<p>New element $w$ for $M^{(i+1)}$ and not in $M^{(i)}$ generate with some
$\Delta_0^p-$formula $\Phi_k$ from family $F_j$ for one place predicate $P_j$:
$\mathfrak{M}^{(i)}\models\Phi_k(w_1,...,w_{n_k})$, where $w$ is finite list $w
= &lt;w_1,...,w_{n_k}&gt;$ and now $\mathfrak{M}^{(i+1)}\models P_j(w)$.
</p>
<p>Then we will create an operator
$\Gamma_{F_{P_1^+},...,F_{P_N^+}}^\mathfrak{M^{(i)}}$ and prove polynomial
analogue of the Gandy fixed point theorem. It allows us to take a different
look on polynomial computability.
</p>
<p>Let $\Gamma^*$: $\Gamma_{F_{P_1^+},...,F_{P_N^+}}^\mathfrak{M}(\Gamma^*) =
\Gamma^*$
</p>
<p>Theorem (polynomial analogue of the Gandy fixed point theorem)
</p>
<p>Fixed point $\Gamma^*$ is $\Delta_0^p-$set and $P_1,...,P_N$ -
$\Delta_0^p-$predicates on $\mathfrak{M}$
</p>
<a href="http://arxiv.org/find/cs/1/au:+Nechesov_A/0/1/0/all/0/1">Andrey Nechesov</a>GANs for Semi-Supervised Opinion Spam Detection. (arXiv:1903.08289v2 [cs.LG] UPDATED)http://arxiv.org/abs/1903.08289
<p>Online reviews have become a vital source of information in purchasing a
service (product). Opinion spammers manipulate reviews, affecting the overall
perception of the service. A key challenge in detecting opinion spam is
obtaining ground truth. Though there exists a large set of reviews online, only
a few of them have been labeled spam or non-spam. In this paper, we propose
spamGAN, a generative adversarial network which relies on limited set of
labeled data as well as unlabeled data for opinion spam detection. spamGAN
improves the state-of-the-art GAN based techniques for text classification.
Experiments on TripAdvisor dataset show that spamGAN outperforms existing spam
detection techniques when limited labeled data is used. Apart from detecting
spam reviews, spamGAN can also generate reviews with reasonable perplexity.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Stanton_G/0/1/0/all/0/1">Gray Stanton</a>, <a href="http://arxiv.org/find/cs/1/au:+Irissappane_A/0/1/0/all/0/1">Athirai A. Irissappane</a>Generalized Off-Policy Actor-Critic. (arXiv:1903.11329v2 [cs.LG] UPDATED)http://arxiv.org/abs/1903.11329
<p>We propose a new objective, the counterfactual objective, unifying existing
objectives for off-policy policy gradient algorithms in the continuing
reinforcement learning (RL) setting. Compared to the commonly used excursion
objective, which can be misleading about the performance of the target policy
when deployed, our new objective better predicts such performance. We prove the
Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient
of the counterfactual objective and use an emphatic approach to get an unbiased
sample from this policy gradient, yielding the Generalized Off-Policy
Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over
existing algorithms in Mujoco robot simulation tasks, the first empirical
success of emphatic algorithms in prevailing deep RL benchmarks.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_S/0/1/0/all/0/1">Shangtong Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Boehmer_W/0/1/0/all/0/1">Wendelin Boehmer</a>, <a href="http://arxiv.org/find/cs/1/au:+Whiteson_S/0/1/0/all/0/1">Shimon Whiteson</a>Human Values and Attitudes towards Vaccination in Social Media. (arXiv:1904.00691v2 [cs.CY] UPDATED)http://arxiv.org/abs/1904.00691
<p>Psychological, political, cultural, and even societal factors are entangled
in the reasoning and decision-making process towards vaccination, rendering
vaccine hesitancy a complex issue. Here, administering a series of surveys via
a Facebook-hosted application, we study the worldviews of people that "Liked"
supportive or vaccine resilient Facebook Pages. In particular, we assess
differences in political viewpoints, moral values, personality traits, and
general interests, finding that those sceptical about vaccination, appear to
trust less the government, are less agreeable, while they are emphasising more
on anti-authoritarian values. Exploring the differences in moral narratives as
expressed in the linguistic descriptions of the Facebook Pages, we see that
pages that defend vaccines prioritise the value of the family while the vaccine
hesitancy pages are focusing on the value of freedom. Finally, creating
embeddings based on the health-related likes on Facebook Pages, we explore
common, latent interests of vaccine-hesitant people, showing a strong
preference for natural cures. This exploratory analysis aims at exploring the
potentials of a social media platform to act as a sensing tool, providing
researchers and policymakers with insights drawn from the digital traces, that
can help design communication campaigns that build confidence, based on the
values that also appeal to the socio-moral criteria of people.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kalimeri_K/0/1/0/all/0/1">Kyriaki Kalimeri</a>, <a href="http://arxiv.org/find/cs/1/au:+Beiro_M/0/1/0/all/0/1">Mariano Beiro</a>, <a href="http://arxiv.org/find/cs/1/au:+Urbinati_A/0/1/0/all/0/1">Alessandra Urbinati</a>, <a href="http://arxiv.org/find/cs/1/au:+Bonanomi_A/0/1/0/all/0/1">Andrea Bonanomi</a>, <a href="http://arxiv.org/find/cs/1/au:+Rosino_A/0/1/0/all/0/1">Alessandro Rosino</a>, <a href="http://arxiv.org/find/cs/1/au:+Cattuto_C/0/1/0/all/0/1">Ciro Cattuto</a>Deep Demosaicing for Edge Implementation. (arXiv:1904.00775v3 [cs.CV] UPDATED)http://arxiv.org/abs/1904.00775
<p>Most digital cameras use sensors coated with a Color Filter Array (CFA) to
capture channel components at every pixel location, resulting in a mosaic image
that does not contain pixel values in all channels. Current research on
reconstructing these missing channels, also known as demosaicing, introduces
many artifacts, such as zipper effect and false color. Many deep learning
demosaicing techniques outperform other classical techniques in reducing the
impact of artifacts. However, most of these models tend to be
over-parametrized. Consequently, edge implementation of the state-of-the-art
deep learning-based demosaicing algorithms on low-end edge devices is a major
challenge. We provide an exhaustive search of deep neural network architectures
and obtain a pareto front of Color Peak Signal to Noise Ratio (CPSNR) as the
performance criterion versus the number of parameters as the model complexity
that beats the state-of-the-art. Architectures on the pareto front can then be
used to choose the best architecture for a variety of resource constraints.
Simple architecture search methods such as exhaustive search and grid search
require some conditions of the loss function to converge to the optimum. We
clarify these conditions in a brief theoretical study.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ramakrishnan_R/0/1/0/all/0/1">Ramchalam Kinattinkara Ramakrishnan</a>, <a href="http://arxiv.org/find/cs/1/au:+Jui_S/0/1/0/all/0/1">Shangling Jui</a>, <a href="http://arxiv.org/find/cs/1/au:+Nia_V/0/1/0/all/0/1">Vahid Patrovi Nia</a>Reducing BERT Pre-Training Time from 3 Days to 76 Minutes. (arXiv:1904.00962v2 [cs.LG] UPDATED)http://arxiv.org/abs/1904.00962
<p>Training large deep neural networks on massive datasets is very challenging.
One promising approach to tackle this issue is through the use of large batch
stochastic optimization. However, our understanding of this approach in the
context of deep learning is still very limited. Furthermore, the current
approaches in this direction are heavily hand-tuned. To this end, we first
study a general adaptation strategy to accelerate training of deep neural
networks using large minibatches. Using this strategy, we develop a new
layer-wise adaptive large batch optimization technique called LAMB. We also
provide a formal convergence analysis of LAMB as well as the previous published
layerwise optimizer LARS, showing convergence to a stationary point in general
nonconvex settings. Our empirical results demonstrate the superior performance
of LAMB for BERT and ResNet-50 training. In particular, for BERT training, our
optimization technique enables use of very large batches sizes of 32868;
thereby, requiring just 8599 iterations to train (as opposed to 1 million
iterations in the original paper). By increasing the batch size to the memory
limit of a TPUv3 pod, BERT training time can be reduced from 3 days to 76
minutes. Finally, we also demonstrate that LAMB outperforms previous
large-batch training algorithms for ResNet-50 on ImageNet; obtaining
state-of-the-art performance in just a few minutes.
</p>
<a href="http://arxiv.org/find/cs/1/au:+You_Y/0/1/0/all/0/1">Yang You</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_J/0/1/0/all/0/1">Jing Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Reddi_S/0/1/0/all/0/1">Sashank Reddi</a>, <a href="http://arxiv.org/find/cs/1/au:+Hseu_J/0/1/0/all/0/1">Jonathan Hseu</a>, <a href="http://arxiv.org/find/cs/1/au:+Kumar_S/0/1/0/all/0/1">Sanjiv Kumar</a>, <a href="http://arxiv.org/find/cs/1/au:+Bhojanapalli_S/0/1/0/all/0/1">Srinadh Bhojanapalli</a>, <a href="http://arxiv.org/find/cs/1/au:+Song_X/0/1/0/all/0/1">Xiaodan Song</a>, <a href="http://arxiv.org/find/cs/1/au:+Demmel_J/0/1/0/all/0/1">James Demmel</a>, <a href="http://arxiv.org/find/cs/1/au:+Hsieh_C/0/1/0/all/0/1">Cho-Jui Hsieh</a>Stokes Inversion based on Convolutional Neural Networks. (arXiv:1904.03714v2 [astro-ph.SR] UPDATED)http://arxiv.org/abs/1904.03714
<p>Spectropolarimetric inversions are routinely used in the field of Solar
Physics for the extraction of physical information from observations. The
application to two-dimensional fields of view often requires the use of
supercomputers with parallelized inversion codes. Even in this case, the
computing time spent on the process is still very large. Our aim is to develop
a new inversion code based on the application of convolutional neural networks
that can quickly provide a three-dimensional cube of thermodynamical and
magnetic properties from the interpretation of two-dimensional maps of Stokes
profiles. We train two different architectures of fully convolutional neural
networks. To this end, we use the synthetic Stokes profiles obtained from two
snapshots of three-dimensional magneto-hydrodynamic numerical simulations of
different structures of the solar atmosphere. We provide an extensive analysis
of the new inversion technique, showing that it infers the thermodynamical and
magnetic properties with a precision comparable to that of standard inversion
techniques. However, it provides several key improvements: our method is around
one million times faster, it returns a three-dimensional view of the physical
properties of the region of interest in geometrical height, it provides
quantities that cannot be obtained otherwise (pressure and Wilson depression)
and the inferred properties are decontaminated from the blurring effect of
instrumental point spread functions for free. The code is provided for free on
a specific repository, with options for training and evaluation.
</p>
<a href="http://arxiv.org/find/astro-ph/1/au:+Ramos_A/0/1/0/all/0/1">A. Asensio Ramos</a> (1,2), <a href="http://arxiv.org/find/astro-ph/1/au:+Baso_C/0/1/0/all/0/1">C. Diaz Baso</a> (3) ((1) Instituto de Astrofisica de Canarias, (2) Universidad de La Laguna, (3) Institute for Solar Physics, Dept. of Astronomy, Stockholm University)Are State-of-the-art Visual Place Recognition Techniques any Good for Aerial Robotics?. (arXiv:1904.07967v2 [cs.CV] UPDATED)http://arxiv.org/abs/1904.07967
<p>Visual Place Recognition (VPR) has seen significant advances at the frontiers
of matching performance and computational superiority over the past few years.
However, these evaluations are performed for ground-based mobile platforms and
cannot be generalized to aerial platforms. The degree of viewpoint variation
experienced by aerial robots is complex, with their processing power and
on-board memory limited by payload size and battery ratings. Therefore, in this
paper, we collect $8$ state-of-the-art VPR techniques that have been previously
evaluated for ground-based platforms and compare them on $2$ recently proposed
aerial place recognition datasets with three prime focuses: a) Matching
performance b) Processing power consumption c) Projected memory requirements.
This gives a birds-eye view of the applicability of contemporary VPR research
to aerial robotics and lays down the the nature of challenges for aerial-VPR.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zaffar_M/0/1/0/all/0/1">Mubariz Zaffar</a>, <a href="http://arxiv.org/find/cs/1/au:+Khaliq_A/0/1/0/all/0/1">Ahmad Khaliq</a>, <a href="http://arxiv.org/find/cs/1/au:+Ehsan_S/0/1/0/all/0/1">Shoaib Ehsan</a>, <a href="http://arxiv.org/find/cs/1/au:+Milford_M/0/1/0/all/0/1">Michael Milford</a>, <a href="http://arxiv.org/find/cs/1/au:+Alexis_K/0/1/0/all/0/1">Kostas Alexis</a>, <a href="http://arxiv.org/find/cs/1/au:+McDonald_Maier_K/0/1/0/all/0/1">Klaus McDonald-Maier</a>3D Object Recognition with Ensemble Learning --- A Study of Point Cloud-Based Deep Learning Models. (arXiv:1904.08159v2 [cs.CV] UPDATED)http://arxiv.org/abs/1904.08159
<p>In this study, we present an analysis of model-based ensemble learning for 3D
point-cloud object classification and detection. An ensemble of multiple model
instances is known to outperform a single model instance, but there is little
study of the topic of ensemble learning for 3D point clouds. First, an ensemble
of multiple model instances trained on the same part of the
$\textit{ModelNet40}$ dataset was tested for seven deep learning, point
cloud-based classification algorithms: $\textit{PointNet}$,
$\textit{PointNet++}$, $\textit{SO-Net}$, $\textit{KCNet}$,
$\textit{DeepSets}$, $\textit{DGCNN}$, and $\textit{PointCNN}$. Second, the
ensemble of different architectures was tested. Results of our experiments show
that the tested ensemble learning methods improve over state-of-the-art on the
$\textit{ModelNet40}$ dataset, from $92.65\%$ to $93.64\%$ for the ensemble of
single architecture instances, $94.03\%$ for two different architectures, and
$94.15\%$ for five different architectures. We show that the ensemble of two
models with different architectures can be as effective as the ensemble of 10
models with the same architecture. Third, a study on classic bagging i.e. with
different subsets used for training multiple model instances) was tested and
sources of ensemble accuracy growth were investigated for best-performing
architecture, i.e. $\textit{SO-Net}$. We also investigate the ensemble learning
of $\textit{Frustum PointNet}$ approach in the task of 3D object detection,
increasing the average precision of 3D box detection on the $\textit{KITTI}$
dataset from $63.1\%$ to $66.5\%$ using only three model instances. We measure
the inference time of all 3D classification architectures on a $\textit{Nvidia
Jetson TX2}$, a common embedded computer for mobile robots, to allude to the
use of these models in real-life applications.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Koguciuk_D/0/1/0/all/0/1">Daniel Koguciuk</a>, <a href="http://arxiv.org/find/cs/1/au:+Chechlinski_L/0/1/0/all/0/1">&#x141;ukasz Chechli&#x144;ski</a>, <a href="http://arxiv.org/find/cs/1/au:+El_Gaaly_T/0/1/0/all/0/1">Tarek El-Gaaly</a>Towards Explainable Anticancer Compound Sensitivity Prediction via Multimodal Attention-based Convolutional Encoders. (arXiv:1904.11223v2 [cs.LG] UPDATED)http://arxiv.org/abs/1904.11223
<p>In line with recent advances in neural drug design and sensitivity
prediction, we propose a novel architecture for interpretable prediction of
anticancer compound sensitivity using a multimodal attention-based
convolutional encoder. Our model is based on the three key pillars of drug
sensitivity: compounds' structure in the form of a SMILES sequence, gene
expression profiles of tumors and prior knowledge on intracellular interactions
from protein-protein interaction networks. We demonstrate that our multiscale
convolutional attention-based (MCA) encoder significantly outperforms a
baseline model trained on Morgan fingerprints, a selection of encoders based on
SMILES as well as previously reported state of the art for multimodal drug
sensitivity prediction (R2 = 0.86 and RMSE = 0.89). Moreover, the
explainability of our approach is demonstrated by a thorough analysis of the
attention weights. We show that the attended genes significantly enrich
apoptotic processes and that the drug attention is strongly correlated with a
standard chemical structure similarity index. Finally, we report a case study
of two receptor tyrosine kinase (RTK) inhibitors acting on a leukemia cell
line, showcasing the ability of the model to focus on informative genes and
submolecular regions of the two compounds. The demonstrated generalizability
and the interpretability of our model testify its potential for in-silico
prediction of anticancer compound efficacy on unseen cancer cells, positioning
it as a valid solution for the development of personalized therapies as well as
for the evaluation of candidate compounds in de novo drug design.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Manica_M/0/1/0/all/0/1">Matteo Manica</a>, <a href="http://arxiv.org/find/cs/1/au:+Oskooei_A/0/1/0/all/0/1">Ali Oskooei</a>, <a href="http://arxiv.org/find/cs/1/au:+Born_J/0/1/0/all/0/1">Jannis Born</a>, <a href="http://arxiv.org/find/cs/1/au:+Subramanian_V/0/1/0/all/0/1">Vigneshwari Subramanian</a>, <a href="http://arxiv.org/find/cs/1/au:+Saez_Rodriguez_J/0/1/0/all/0/1">Julio S&#xe1;ez-Rodr&#xed;guez</a>, <a href="http://arxiv.org/find/cs/1/au:+Martinez_M/0/1/0/all/0/1">Mar&#xed;a Rodr&#xed;guez Mart&#xed;nez</a>Landmark-Based Approaches for Goal Recognition as Planning. (arXiv:1904.11739v2 [cs.AI] UPDATED)http://arxiv.org/abs/1904.11739
<p>The task of recognizing goals and plans from missing and full observations
can be done efficiently by using automated planning techniques. In many
applications, it is important to recognize goals and plans not only accurately,
but also quickly. To address this challenge, we develop novel goal recognition
approaches based on planning techniques that rely on planning landmarks. In
automated planning, landmarks are properties (or actions) that cannot be
avoided to achieve a goal. We show the applicability of a number of planning
techniques with an emphasis on landmarks for goal and plan recognition tasks in
two settings: (1) we use the concept of landmarks to develop goal recognition
heuristics; and (2) we develop a landmark-based filtering method to refine
existing planning-based goal and plan recognition approaches. These recognition
approaches are empirically evaluated in experiments over several classical
planning domains. We show that our goal recognition approaches yield not only
accuracy comparable to (and often higher than) other state-of-the-art
techniques, but also substantially faster recognition time over such
techniques.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pereira_R/0/1/0/all/0/1">Ramon Fraga Pereira</a>, <a href="http://arxiv.org/find/cs/1/au:+Oren_N/0/1/0/all/0/1">Nir Oren</a>, <a href="http://arxiv.org/find/cs/1/au:+Meneguzzi_F/0/1/0/all/0/1">Felipe Meneguzzi</a>Automatic Support Removal for Additive Manufacturing Post Processing. (arXiv:1904.12117v2 [cs.CG] UPDATED)http://arxiv.org/abs/1904.12117
<p>An additive manufacturing (AM) process often produces a {\it near-net} shape
that closely conforms to the intended design to be manufactured. It sometimes
contains additional support structure (also called scaffolding), which has to
be removed in post-processing. We describe an approach to automatically
generate process plans for support removal using a multi-axis machining
instrument. The goal is to fracture the contact regions between each support
component and the part, and to do it in the most cost-effective order while
avoiding collisions with evolving near-net shape, including the remaining
support components. A recursive algorithm identifies a maximal collection of
support components whose connection regions to the part are accessible as well
as the orientations at which they can be removed at a given round. For every
such region, the accessible orientations appear as a 'fiber' in the
collision-free space of the evolving near-net shape and the tool assembly. To
order the removal of accessible supports, the algorithm constructs a search
graph whose edges are weighted by the Riemannian distance between the fibers.
The least expensive process plan is obtained by solving a traveling salesman
problem (TSP) over the search graph. The sequence of configurations obtained by
solving TSP is used as the input to a motion planner that finds collision free
paths to visit all accessible features. The resulting part without the support
structure can then be finished using traditional machining to produce the
intended design. The effectiveness of the method is demonstrated through
benchmark examples in 3D.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Nelaturi_S/0/1/0/all/0/1">Saigopal Nelaturi</a>, <a href="http://arxiv.org/find/cs/1/au:+Behandish_M/0/1/0/all/0/1">Morad Behandish</a>, <a href="http://arxiv.org/find/cs/1/au:+Mirzendehdel_A/0/1/0/all/0/1">Amir M. Mirzendehdel</a>, <a href="http://arxiv.org/find/cs/1/au:+Kleer_J/0/1/0/all/0/1">Johan de Kleer</a>A Classification of Topological Discrepancies in Additive Manufacturing. (arXiv:1904.13210v2 [cs.CG] UPDATED)http://arxiv.org/abs/1904.13210
<p>Additive manufacturing (AM) enables enormous freedom for design of complex
structures. However, the process-dependent limitations that result in
discrepancies between as-designed and as-manufactured shapes are not fully
understood. The tradeoffs between infinitely many different ways to approximate
a design by a manufacturable replica are even harder to characterize. To
support design for AM (DfAM), one has to quantify local discrepancies
introduced by AM processes, identify the detrimental deviations (if any) to the
original design intent, and prescribe modifications to the design and/or
process parameters to countervail their effects. Our focus in this work will be
on topological analysis. There is ample evidence in many applications that
preserving local topology (e.g., connectivity of beams in a lattice) is
important even when slight geometric deviations can be tolerated. We first
present a generic method to characterize local topological discrepancies due to
material under- and over-deposition in AM, and show how it captures various
types of defects in the as-manufactured structures. We use this information to
systematically modify the as-manufactured outcomes within the limitations of
available 3D printer resolution(s), which often comes at the expense of
introducing more geometric deviations (e.g., thickening a beam to avoid
disconnection). We validate the effectiveness of the method on 3D examples with
nontrivial topologies such as lattice structures and foams.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Behandish_M/0/1/0/all/0/1">Morad Behandish</a>, <a href="http://arxiv.org/find/cs/1/au:+Mirzendehdel_A/0/1/0/all/0/1">Amir M. Mirzendehdel</a>, <a href="http://arxiv.org/find/cs/1/au:+Nelaturi_S/0/1/0/all/0/1">Saigopal Nelaturi</a>You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle. (arXiv:1905.00877v4 [stat.ML] UPDATED)http://arxiv.org/abs/1905.00877
<p>Deep learning achieves state-of-the-art results in many tasks in computer
vision and natural language processing. However, recent works have shown that
deep networks can be vulnerable to adversarial perturbations, which raised a
serious robustness issue of deep networks. Adversarial training, typically
formulated as a robust optimization problem, is an effective way of improving
the robustness of deep networks. A major drawback of existing adversarial
training algorithms is the computational overhead of the generation of
adversarial examples, typically far greater than that of the network training.
This leads to the unbearable overall computational cost of adversarial
training. In this paper, we show that adversarial training can be cast as a
discrete time differential game. Through analyzing the Pontryagin's Maximal
Principle (PMP) of the problem, we observe that the adversary update is only
coupled with the parameters of the first layer of the network. This inspires us
to restrict most of the forward and back propagation within the first layer of
the network during adversary updates. This effectively reduces the total number
of full forward and backward propagation to only one for each group of
adversary updates. Therefore, we refer to this algorithm YOPO (You Only
Propagate Once). Numerical experiments demonstrate that YOPO can achieve
comparable defense accuracy with approximately 1/5 ~ 1/4 GPU time of the
projected gradient descent (PGD) algorithm. Our codes are available at
https://https://github.com/a1600012888/YOPO-You-Only-Propagate-Once.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Zhang_D/0/1/0/all/0/1">Dinghuai Zhang</a>, <a href="http://arxiv.org/find/stat/1/au:+Zhang_T/0/1/0/all/0/1">Tianyuan Zhang</a>, <a href="http://arxiv.org/find/stat/1/au:+Lu_Y/0/1/0/all/0/1">Yiping Lu</a>, <a href="http://arxiv.org/find/stat/1/au:+Zhu_Z/0/1/0/all/0/1">Zhanxing Zhu</a>, <a href="http://arxiv.org/find/stat/1/au:+Dong_B/0/1/0/all/0/1">Bin Dong</a>Stochastic quantization with Weighted distribution in discrete time for Filtering and Control. (arXiv:1905.01471v2 [cs.SY] UPDATED)http://arxiv.org/abs/1905.01471
<p>Path integral has been found to be useful as a way to study stochastic
processes such as filtering and control. However, it is not clear how to
directly define the path integral itself as a probability density for the
stochastic processes in the perspective of physics. This paper proposes
weighted distribution in discrete-time stochastic quantization to describe
nonlinear stochastic processes on filtering and control. Although the weighted
distribution has been considered in fields on statistics, we explain that the
weighted distribution gives the path integral additional potentials while
preserving as a role of probability density for the path integral. We construct
explicit models on Extended Kalman filter, Extended Kalman filter with a
constraint and a nonlinear stochastic control model for the suitable weighted
distributions. It is typical in nonlinear filtering that observations assume
the existence of the corresponding probability density. In our model, we point
out that the potential involves the observations as external sources which do
not require the probability density. While backward equations are used for
typical control models, the nonlinear control model is described by forward
difference equations in our model. A control variable is naturally given as a
force induced from the potentials. The numerical simulations show that Langevin
equation can be controlled towards a target value given by a local minimum of
the potential.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Sano_M/0/1/0/all/0/1">Masakazu Sano</a>A Latent Variational Framework for Stochastic Optimization. (arXiv:1905.01707v4 [cs.LG] UPDATED)http://arxiv.org/abs/1905.01707
<p>This paper provides a unifying theoretical framework for stochastic
optimization algorithms by means of a latent stochastic variational problem.
Using techniques from stochastic control, the solution to the variational
problem is shown to be equivalent to that of a Forward Backward Stochastic
Differential Equation (FBSDE). By solving these equations, we recover a variety
of existing adaptive stochastic gradient descent methods. This framework
establishes a direct connection between stochastic optimization algorithms and
a secondary Bayesian inference problem on gradients, where a prior measure on
noisy gradient observations determine the resulting algorithm.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Casgrain_P/0/1/0/all/0/1">Philippe Casgrain</a>Learning Optimal Data Augmentation Policies via Bayesian Optimization for Image Classification Tasks. (arXiv:1905.02610v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.02610
<p>In recent years, deep learning has achieved remarkable achievements in many
fields, including computer vision, natural language processing, speech
recognition and others. Adequate training data is the key to ensure the
effectiveness of the deep models. However, obtaining valid data requires a lot
of time and labor resources. Data augmentation (DA) is an effective alternative
approach, which can generate new labeled data based on existing data using
label-preserving transformations. Although we can benefit a lot from DA,
designing appropriate DA policies requires a lot of expert experience and time
consumption, and the evaluation of searching the optimal policies is costly. So
we raise a new question in this paper: how to achieve automated data
augmentation at as low cost as possible? We propose a method named BO-Aug for
automating the process by finding the optimal DA policies using the Bayesian
optimization approach. Our method can find the optimal policies at a relatively
low search cost, and the searched policies based on a specific dataset are
transferable across different neural network architectures or even different
datasets. We validate the BO-Aug on three widely used image classification
datasets, including CIFAR-10, CIFAR-100 and SVHN. Experimental results show
that the proposed method can achieve state-of-the-art or near advanced
classification accuracy. Code to reproduce our experiments is available at
https://github.com/zhangxiaozao/BO-Aug.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhang_C/0/1/0/all/0/1">Chunxu Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Cui_J/0/1/0/all/0/1">Jiaxu Cui</a>, <a href="http://arxiv.org/find/cs/1/au:+Yang_B/0/1/0/all/0/1">Bo Yang</a>An Empirical Evaluation of Adversarial Robustness under Transfer Learning. (arXiv:1905.02675v3 [stat.ML] UPDATED)http://arxiv.org/abs/1905.02675
<p>In this work, we evaluate adversarial robustness in the context of transfer
learning from a source trained on CIFAR 100 to a target network trained on
CIFAR 10. Specifically, we study the effects of using robust optimisation in
the source and target networks. This allows us to identify transfer learning
strategies under which adversarial defences are successfully retained, in
addition to revealing potential vulnerabilities. We study the extent to which
features learnt by a fast gradient sign method (FGSM) and its iterative
alternative (PGD) can preserve their defence properties against black and
white-box attacks under three different transfer learning strategies. We find
that using PGD examples during training on the source task leads to more
general robust features that are easier to transfer. Furthermore, under
successful transfer, it achieves 5.2% more accuracy against white-box PGD
attacks than suitable baselines. Overall, our empirical evaluations give
insights on how well adversarial robustness under transfer learning can
generalise.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Davchev_T/0/1/0/all/0/1">Todor Davchev</a>, <a href="http://arxiv.org/find/stat/1/au:+Korres_T/0/1/0/all/0/1">Timos Korres</a>, <a href="http://arxiv.org/find/stat/1/au:+Fotiadis_S/0/1/0/all/0/1">Stathi Fotiadis</a>, <a href="http://arxiv.org/find/stat/1/au:+Antonopoulos_N/0/1/0/all/0/1">Nick Antonopoulos</a>, <a href="http://arxiv.org/find/stat/1/au:+Ramamoorthy_S/0/1/0/all/0/1">Subramanian Ramamoorthy</a>Linear Range in Gradient Descent. (arXiv:1905.04561v2 [math.OC] UPDATED)http://arxiv.org/abs/1905.04561
<p>This paper defines linear range as the range of parameter perturbations which
lead to approximately linear perturbations in the states of a network. We
compute linear range from the difference between actual perturbations in states
and the tangent solution. Linear range is a new criterion for estimating the
effectivenss of gradients and thus having many possible applications. In
particular, we propose that the optimal learning rate at the initial stages of
training is such that parameter changes on all minibatches are within linear
range. We demonstrate our algorithm on two shallow neural networks and a
ResNet.
</p>
<a href="http://arxiv.org/find/math/1/au:+Ni_A/0/1/0/all/0/1">Angxiu Ni</a>, <a href="http://arxiv.org/find/math/1/au:+Talnikar_C/0/1/0/all/0/1">Chaitanya Talnikar</a>ISBNet: Instance-aware Selective Branching Network. (arXiv:1905.04849v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.04849
<p>Recent years have witnessed growing interests in designing efficient neural
networks and neural architecture search (NAS). Although remarkable efficiency
and accuracy have been achieved, existing expert designed and NAS models
neglect the fact that input instances are of varying complexity thus different
amount of computation is required. Inference with a fixed model that processes
all instances through the same transformations would waste plenty of
computational resources. Therefore, customizing the model capacity in an
instance-aware manner is highly demanded. To address this issue, we propose an
Instance-aware Selective Branching Network-ISBNet, which supports efficient
instance-level inference by selectively bypassing transformation branches of
insignificant importance weight. These weights are determined dynamically by
accompanying lightweight hypernetworks SelectionNets and further recalibrated
by gumbel-softmax for sparse branch selection. Extensive experiments show that
ISBNet achieves extremely efficient inference in terms of parameter size and
FLOPs comparing to existing networks. For example, ISBNet takes only 8.03%
parameters and 30.60% FLOPs of the state-of-the-art efficient network
ShuffleNetV2 with comparable accuracy.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Cai_S/0/1/0/all/0/1">Shaofeng Cai</a>, <a href="http://arxiv.org/find/cs/1/au:+Shu_Y/0/1/0/all/0/1">Yao Shu</a>, <a href="http://arxiv.org/find/cs/1/au:+Wang_W/0/1/0/all/0/1">Wei Wang</a>, <a href="http://arxiv.org/find/cs/1/au:+Ooi_B/0/1/0/all/0/1">Beng Chin Ooi</a>Learning Hierarchical Priors in VAEs. (arXiv:1905.04982v3 [stat.ML] UPDATED)http://arxiv.org/abs/1905.04982
<p>We propose to learn a hierarchical prior in the context of variational
autoencoders to avoid the over-regularisation resulting from a standard normal
prior distribution. To incentivise an informative latent representation of the
data by learning a rich hierarchical prior, we formulate the objective function
as the Lagrangian of a constrained-optimisation problem and propose an
optimisation algorithm inspired by Taming VAEs. We introduce a graph-based
interpolation method, which shows that the topology of the learned latent
representation corresponds to the topology of the data manifold---and present
several examples, where desired properties of latent representation such as
smoothness and simple explanatory factors are learned by the prior.
Furthermore, we validate our approach on standard datasets, obtaining
state-of-the-art test log-likelihoods.
</p>
<a href="http://arxiv.org/find/stat/1/au:+Klushyn_A/0/1/0/all/0/1">Alexej Klushyn</a>, <a href="http://arxiv.org/find/stat/1/au:+Chen_N/0/1/0/all/0/1">Nutan Chen</a>, <a href="http://arxiv.org/find/stat/1/au:+Kurle_R/0/1/0/all/0/1">Richard Kurle</a>, <a href="http://arxiv.org/find/stat/1/au:+Cseke_B/0/1/0/all/0/1">Botond Cseke</a>, <a href="http://arxiv.org/find/stat/1/au:+Smagt_P/0/1/0/all/0/1">Patrick van der Smagt</a>Entity-Relation Extraction as Multi-Turn Question Answering. (arXiv:1905.05529v2 [cs.CL] UPDATED)http://arxiv.org/abs/1905.05529
<p>In this paper, we propose a new paradigm for the task of entity-relation
extraction. We cast the task as a multi-turn question answering problem, i.e.,
the extraction of entities and relations is transformed to the task of
identifying answer spans from the context. This multi-turn QA formalization
comes with several key advantages: firstly, the question query encodes
important information for the entity/relation class we want to identify;
secondly, QA provides a natural way of jointly modeling entity and relation;
and thirdly, it allows us to exploit the well developed machine reading
comprehension (MRC) models. Experiments on the ACE and the CoNLL04 corpora
demonstrate that the proposed paradigm significantly outperforms previous best
models. We are able to obtain the state-of-the-art results on all of the ACE04,
ACE05 and CoNLL04 datasets, increasing the SOTA results on the three datasets
to 49.4 (+1.0), 60.2 (+0.6) and 68.9 (+2.1), respectively. Additionally, we
construct a newly developed dataset RESUME in Chinese, which requires
multi-step reasoning to construct entity dependencies, as opposed to the
single-step dependency extraction in the triplet exaction in previous datasets.
The proposed multi-turn QA model also achieves the best performance on the
RESUME dataset.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Li_X/0/1/0/all/0/1">Xiaoya Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Yin_F/0/1/0/all/0/1">Fan Yin</a>, <a href="http://arxiv.org/find/cs/1/au:+Sun_Z/0/1/0/all/0/1">Zijun Sun</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_X/0/1/0/all/0/1">Xiayu Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Yuan_A/0/1/0/all/0/1">Arianna Yuan</a>, <a href="http://arxiv.org/find/cs/1/au:+Chai_D/0/1/0/all/0/1">Duo Chai</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhou_M/0/1/0/all/0/1">Mingxin Zhou</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_J/0/1/0/all/0/1">Jiwei Li</a>Evaluation Metrics for Unsupervised Learning Algorithms. (arXiv:1905.05667v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.05667
<p>Determining the quality of the results obtained by clustering techniques is a
key issue in unsupervised machine learning. Many authors have discussed the
desirable features of good clustering algorithms. However, Jon Kleinberg
established an impossibility theorem for clustering. As a consequence, a wealth
of studies have proposed techniques to evaluate the quality of clustering
results depending on the characteristics of the clustering problem and the
algorithmic technique employed to cluster data.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Palacio_Nino_J/0/1/0/all/0/1">Julio-Omar Palacio-Ni&#xf1;o</a>, <a href="http://arxiv.org/find/cs/1/au:+Berzal_F/0/1/0/all/0/1">Fernando Berzal</a>Variational Regret Bounds for Reinforcement Learning. (arXiv:1905.05857v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.05857
<p>We consider undiscounted reinforcement learning in Markov decision processes
(MDPs) where both the reward functions and the state-transition probabilities
may vary (gradually or abruptly) over time. For this problem setting, we
propose an algorithm and provide performance guarantees for the regret
evaluated against the optimal non-stationary policy. The upper bound on the
regret is given in terms of the total variation in the MDP. This is the first
variational regret bound for the general reinforcement learning setting.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Gajane_P/0/1/0/all/0/1">Pratik Gajane</a>, <a href="http://arxiv.org/find/cs/1/au:+Ortner_R/0/1/0/all/0/1">Ronald Ortner</a>, <a href="http://arxiv.org/find/cs/1/au:+Auer_P/0/1/0/all/0/1">Peter Auer</a>Multinomial Distribution Learning for Effective Neural Architecture Search. (arXiv:1905.07529v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.07529
<p>Architectures obtained by Neural Architecture Search (NAS) have achieved
highly competitive performance in various computer vision tasks. However, the
prohibitive computation demand of forward-backward propagation in deep neural
networks and searching algorithms makes it difficult to apply NAS in practice.
In this paper, we propose a Multinomial Distribution Learning for extremely
effective NAS, which considers the search space as a joint multinomial
distribution, i.e., the operation between two nodes is sampled from this
distribution, and the optimal network structure is obtained by the operations
with the most likely probability in this distribution. Therefore, NAS can be
transformed to a multinomial distribution learning problem, i.e., the
distribution is optimized to have high expectation of the performance. Besides,
a hypothesis that the performance ranking is consistent in every training epoch
is proposed and demonstrated to further accelerate the learning process.
Experiments on CIFAR-10 and ImageNet demonstrate the effectiveness of our
method. On CIFAR-10, the structure searched by our method achieves 2.4\% test
error, while being 6.0 $\times$ (only 4 GPU hours on GTX1080Ti) faster compared
with state-of-the-art NAS algorithms. On ImageNet, our model achieves 75.2\%
top-1 accuracy under MobileNet settings (MobileNet V1/V2), while being
1.2$\times$ faster with measured GPU latency. Test code is available at
https://github.com/tanglang96/MDENAS
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zheng_X/0/1/0/all/0/1">Xiawu Zheng</a>, <a href="http://arxiv.org/find/cs/1/au:+Ji_R/0/1/0/all/0/1">Rongrong Ji</a>, <a href="http://arxiv.org/find/cs/1/au:+Tang_L/0/1/0/all/0/1">Lang Tang</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_B/0/1/0/all/0/1">Baochang Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_J/0/1/0/all/0/1">Jianzhuang Liu</a>, <a href="http://arxiv.org/find/cs/1/au:+Tian_Q/0/1/0/all/0/1">Qi Tian</a>Towards Safety-Aware Computing System Design in Autonomous Vehicles. (arXiv:1905.08453v2 [cs.RO] UPDATED)http://arxiv.org/abs/1905.08453
<p>Recently, autonomous driving development ignited competition among car makers
and technical corporations. Low-level automation cars are already commercially
available. But high automated vehicles where the vehicle drives by itself
without human monitoring is still at infancy. Such autonomous vehicles (AVs)
rely on the computing system in the car to to interpret the environment and
make driving decisions. Therefore, computing system design is essential
particularly in enhancing the attainment of driving safety. However, to our
knowledge, no clear guideline exists so far regarding safety-aware AV computing
system and architecture design. To understand the safety requirement of AV
computing system, we performed a field study by running industrial Level-4
autonomous driving fleets in various locations, road conditions, and traffic
patterns. The field study indicates that traditional computing system
performance metrics, such as tail latency, average latency, maximum latency,
and timeout, cannot fully satisfy the safety requirement for AV computing
system design. To address this issue, we propose a `safety score' as a primary
metric for measuring the level of safety in AV computing system design.
Furthermore, we propose a perception latency model, which helps architects
estimate the safety score of given architecture and system design without
physically testing them in an AV. We demonstrate the use of our safety score
and latency model, by developing and evaluating a safety-aware AV computing
system computation hardware resource management scheme.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Zhao_H/0/1/0/all/0/1">Hengyu Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhang_Y/0/1/0/all/0/1">Yubo Zhang</a>, <a href="http://arxiv.org/find/cs/1/au:+Meng_P/0/1/0/all/0/1">Pingfan Meng</a>, <a href="http://arxiv.org/find/cs/1/au:+Shi_H/0/1/0/all/0/1">Hui Shi</a>, <a href="http://arxiv.org/find/cs/1/au:+Li_L/0/1/0/all/0/1">Li Erran Li</a>, <a href="http://arxiv.org/find/cs/1/au:+Lou_T/0/1/0/all/0/1">Tiancheng Lou</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_J/0/1/0/all/0/1">Jishen Zhao</a>Compression with Flows via Local Bits-Back Coding. (arXiv:1905.08500v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.08500
<p>Likelihood-based generative models are the backbones of lossless compression,
due to the guaranteed existence of codes with lengths close to negative log
likelihood. However, there is no guaranteed existence of computationally
efficient codes that achieve these lengths, and coding algorithms must be
hand-tailored to specific types of generative models to ensure computational
efficiency. Such coding algorithms are known for autoregressive models and
variational autoencoders, but not for general types of flow models. To fill in
this gap, we introduce local bits-back coding, a new compression technique
compatible with flow models. We present efficient algorithms that instantiate
our technique for many popular types of flows, and we demonstrate that our
algorithms closely achieve theoretical codelengths for state-of-the-art flow
models on high-dimensional data.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ho_J/0/1/0/all/0/1">Jonathan Ho</a>, <a href="http://arxiv.org/find/cs/1/au:+Lohn_E/0/1/0/all/0/1">Evan Lohn</a>, <a href="http://arxiv.org/find/cs/1/au:+Abbeel_P/0/1/0/all/0/1">Pieter Abbeel</a>Joint embedding of structure and features via graph convolutional networks. (arXiv:1905.08636v2 [cs.LG] UPDATED)http://arxiv.org/abs/1905.08636
<p>The creation of social ties is largely determined by the entangled effects of
people's similarities in terms of individual characters and friends. However,
feature and structural characters of people usually appear to be correlated,
making it difficult to determine which has greater responsibility in the
formation of the emergent network structure. We propose \emph{AN2VEC}, a node
embedding method which ultimately aims at disentangling the information shared
by the structure of a network and the features of its nodes. Building on the
recent developments of Graph Convolutional Networks (GCN), we develop a
multitask GCN Variational Autoencoder where different dimensions of the
generated embeddings can be dedicated to encoding feature information, network
structure, and shared feature-network information. We explore the interaction
between these disentangled characters by comparing the embedding reconstruction
performance to a baseline case where no shared information is extracted. We use
synthetic datasets with different levels of interdependency between feature and
network characters and show (i) that shallow embeddings relying on shared
information perform better than the corresponding reference with unshared
information, (ii) that this performance gap increases with the correlation
between network and feature structure, and (iii) that our embedding is able to
capture joint information of structure and features. Our method can be relevant
for the analysis and prediction of any featured network structure ranging from
online social systems to network medicine.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Lerique_S/0/1/0/all/0/1">S&#xe9;bastien Lerique</a> (1), <a href="http://arxiv.org/find/cs/1/au:+Abitbol_J/0/1/0/all/0/1">Jacob Levy Abitbol</a> (1), <a href="http://arxiv.org/find/cs/1/au:+Karsai_M/0/1/0/all/0/1">M&#xe1;rton Karsai</a> (1) ((1) IXXI, LIP (UMR 5668, Univ Lyon-ENS de Lyon-Inria-CNRS-UCB Lyon 1))Equilibrium Characterization for Data Acquisition Games. (arXiv:1905.08909v2 [cs.GT] UPDATED)http://arxiv.org/abs/1905.08909
<p>We study a game between two firms in which each provide a service based on
machine learning. The firms are presented with the opportunity to purchase a
new corpus of data, which will allow them to potentially improve the quality of
their products. The firms can decide whether or not they want to buy the data,
as well as which learning model to build with that data. We demonstrate a
reduction from this potentially complicated action space to a one-shot,
two-action game in which each firm only decides whether or not to buy the data.
The game admits several regimes which depend on the relative strength of the
two firms at the outset and the price at which the data is being offered. We
analyze the game's Nash equilibria in all parameter regimes and demonstrate
that, in expectation, the outcome of the game is that the initially stronger
firm's market position weakens whereas the initially weaker firm's market
position becomes stronger. Finally, we consider the perspective of the users of
the service and demonstrate that the expected outcome at equilibrium is not the
one which maximizes the welfare of the consumers.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Dong_J/0/1/0/all/0/1">Jinshuo Dong</a>, <a href="http://arxiv.org/find/cs/1/au:+Elzayn_H/0/1/0/all/0/1">Hadi Elzayn</a>, <a href="http://arxiv.org/find/cs/1/au:+Jabbari_S/0/1/0/all/0/1">Shahin Jabbari</a>, <a href="http://arxiv.org/find/cs/1/au:+Kearns_M/0/1/0/all/0/1">Michael Kearns</a>, <a href="http://arxiv.org/find/cs/1/au:+Schutzman_Z/0/1/0/all/0/1">Zachary Schutzman</a>The Steiner triple systems of order 21 with a transversal subdesign TD(3,6). (arXiv:1905.09081v2 [math.CO] UPDATED)http://arxiv.org/abs/1905.09081
<p>We prove several structural properties of Steiner triple systems (STS) of
order 3w+3 that include one or more transversal subdesigns TD(3,w). Using an
exhaustive search, we find that there are 2004720 isomorphism classes of
STS(21) including a subdesign TD(3,6), or, equivalently, a 6-by-6 latin square.
</p>
<a href="http://arxiv.org/find/math/1/au:+Guan_Y/0/1/0/all/0/1">Yue Guan</a> (1), <a href="http://arxiv.org/find/math/1/au:+Shi_M/0/1/0/all/0/1">Minjia Shi</a> (1), <a href="http://arxiv.org/find/math/1/au:+Krotov_D/0/1/0/all/0/1">Denis S. Krotov</a> (2) ((1) Anhui University, Hefei, China, (2) Sobolev Institute of Mathematics, Novosibirsk, Russia)Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction. (arXiv:1905.09248v2 [cs.IR] UPDATED)http://arxiv.org/abs/1905.09248
<p>Click-through rate (CTR) prediction is critical for industrial applications
such as recommender system and online advertising. Practically, it plays an
important role for CTR modeling in these applications by mining user interest
from rich historical behavior data. Driven by the development of deep learning,
deep CTR models with ingeniously designed architecture for user interest
modeling have been proposed, bringing remarkable improvement of model
performance over offline metric.However, great efforts are needed to deploy
these complex models to online serving system for realtime inference, facing
massive traffic request. Things turn to be more difficult when it comes to long
sequential user behavior data, as the system latency and storage cost increase
approximately linearly with the length of user behavior sequence.In this paper,
we face directly the challenge of long sequential user behavior modeling and
introduce our hands-on practice with the co-design of machine learning
algorithm and online serving system for CTR prediction task. Theoretically, the
co-design solution of UIC and MIMN enables us to handle the user interest
modeling with unlimited length of sequential behavior data. Comparison between
model performance and system efficiency proves the effectiveness of proposed
solution. To our knowledge, this is one of the first industrial solutions that
are capable of handling long sequential user behavior data with length scaling
up to thousands. It now has been deployed in the display advertising system in
Alibaba.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Pi_Q/0/1/0/all/0/1">Qi Pi</a>, <a href="http://arxiv.org/find/cs/1/au:+Bian_W/0/1/0/all/0/1">Weijie Bian</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhou_G/0/1/0/all/0/1">Guorui Zhou</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhu_X/0/1/0/all/0/1">Xiaoqiang Zhu</a>, <a href="http://arxiv.org/find/cs/1/au:+Gai_K/0/1/0/all/0/1">Kun Gai</a>FastSpeech: Fast, Robust and Controllable Text to Speech. (arXiv:1905.09263v2 [cs.CL] UPDATED)http://arxiv.org/abs/1905.09263
<p>Neural network based end-to-end text to speech (TTS) has significantly
improved the quality of synthesized speech. Prominent methods (e.g., Tacotron
2) usually first generate mel-spectrogram from text, and then synthesize speech
from mel-spectrogram using vocoder such as WaveNet. Compared with traditional
concatenative and statistical parametric approaches, neural network based
end-to-end models suffer from slow inference speed, and the synthesized speech
is usually not robust (i.e., some words are skipped or repeated) and lack of
controllability (voice speed or prosody control). In this work, we propose a
novel feed-forward network based on Transformer to generate mel-spectrogram in
parallel for TTS. Specifically, we extract attention alignments from an
encoder-decoder based teacher model for phoneme duration prediction, which is
used by a length regulator to expand the source phoneme sequence to match the
length of target mel-spectrogram sequence for parallel mel-spectrogram
generation. Experiments on the LJSpeech dataset show that our parallel model
matches autoregressive models in terms of speech quality, nearly eliminates the
problem of word skipping and repeating in particularly hard cases, and can
adjust voice speed smoothly. Most importantly, compared with autoregressive
Transformer TTS, our model speeds up the mel-spectrogram generation by 270x and
the end-to-end speech synthesis by 38x. Therefore, we call our model
FastSpeech. We will release the code on Github (anonymous.url). Synthesized
speech samples can be found in https://speechresearch.github.io/fastspeech/.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Ren_Y/0/1/0/all/0/1">Yi Ren</a>, <a href="http://arxiv.org/find/cs/1/au:+Ruan_Y/0/1/0/all/0/1">Yangjun Ruan</a>, <a href="http://arxiv.org/find/cs/1/au:+Tan_X/0/1/0/all/0/1">Xu Tan</a>, <a href="http://arxiv.org/find/cs/1/au:+Qin_T/0/1/0/all/0/1">Tao Qin</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_S/0/1/0/all/0/1">Sheng Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Zhao_Z/0/1/0/all/0/1">Zhou Zhao</a>, <a href="http://arxiv.org/find/cs/1/au:+Liu_T/0/1/0/all/0/1">Tie-Yan Liu</a>LazyLedger: A Distributed Data Availability Ledger With Client-Side Smart Contracts. (arXiv:1905.09274v2 [cs.CR] UPDATED)http://arxiv.org/abs/1905.09274
<p>We propose LazyLedger, a design for distributed ledgers where the blockchain
is optimised for solely ordering and guaranteeing the availability of
transactions. Responsibility for executing and validating transactions is
shifted to only the clients that have an interest in certain transactions. As
the core function of the consensus system of a distributed ledger is to order
transactions and ensure their availability, consensus participants do not
necessarily need to be concerned with the content of those transactions. This
reduces the problem of block verification to data availability verification,
which can be achieved probabilistically without downloading the whole block.
The amount of resources required to reach consensus can thus be minimised, as
transaction validity rules can be decoupled from consensus rules. We also
implement and evaluate several example LazyLedger applications, and validate
that the workload of clients of specific applications does not significantly
increase when the workload of other applications increase.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Al_Bassam_M/0/1/0/all/0/1">Mustafa Al-Bassam</a>Extended Active Learning Method. (arXiv:1011.2512v2 [cs.AI] CROSS LISTED)http://arxiv.org/abs/1011.2512
<p>Active Learning Method (ALM) is a soft computing method which is used for
modeling and control, based on fuzzy logic. Although ALM has shown that it acts
well in dynamic environments, its operators cannot support it very well in
complex situations due to losing data. Thus ALM can find better membership
functions if more appropriate operators be chosen for it. This paper
substituted two new operators instead of ALM original ones; which consequently
renewed finding membership functions in a way superior to conventional ALM.
This new method is called Extended Active Learning Method (EALM).
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kiaei_A/0/1/0/all/0/1">Ali Akbar Kiaei</a>, <a href="http://arxiv.org/find/cs/1/au:+Shouraki_S/0/1/0/all/0/1">Saeed Bagheri Shouraki</a>, <a href="http://arxiv.org/find/cs/1/au:+Khasteh_S/0/1/0/all/0/1">Seyed Hossein Khasteh</a>, <a href="http://arxiv.org/find/cs/1/au:+Khademi_M/0/1/0/all/0/1">Mahmoud Khademi</a>, <a href="http://arxiv.org/find/cs/1/au:+Samani_A/0/1/0/all/0/1">Alireza Ghatreh Samani</a>Automated shapeshifting for function recovery in damaged robots. (arXiv:1905.09264v1 [cs.RO] CROSS LISTED)http://arxiv.org/abs/1905.09264
<p>A robot's mechanical parts routinely wear out from normal functioning and can
be lost to injury. For autonomous robots operating in isolated or hostile
environments, repair from a human operator is often not possible. Thus, much
work has sought to automate damage recovery in robots. However, every case
reported in the literature to date has accepted the damaged mechanical
structure as fixed, and focused on learning new ways to control it. Here we
show for the first time a robot that automatically recovers from unexpected
damage by deforming its resting mechanical structure without changing its
control policy. We found that, especially in the case of "deep insult", such as
removal of all four of the robot's legs, the damaged machine evolves shape
changes that not only recover the original level of function (locomotion) as
before, but can in fact surpass the original level of performance (speed). This
suggests that shape change, instead of control readaptation, may be a better
method to recover function after damage in some cases.
</p>
<a href="http://arxiv.org/find/cs/1/au:+Kriegman_S/0/1/0/all/0/1">Sam Kriegman</a>, <a href="http://arxiv.org/find/cs/1/au:+Walker_S/0/1/0/all/0/1">Stephanie Walker</a>, <a href="http://arxiv.org/find/cs/1/au:+Shah_D/0/1/0/all/0/1">Dylan Shah</a>, <a href="http://arxiv.org/find/cs/1/au:+Levin_M/0/1/0/all/0/1">Michael Levin</a>, <a href="http://arxiv.org/find/cs/1/au:+Kramer_Bottiglio_R/0/1/0/all/0/1">Rebecca Kramer-Bottiglio</a>, <a href="http://arxiv.org/find/cs/1/au:+Bongard_J/0/1/0/all/0/1">Josh Bongard</a>