2019

In Proceedings of the International Conference on Machine Learning (ICML), International Conference on Machine Learning (ICML), June 2019 (inproceedings)Accepted

Abstract

Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.

The Variational Autoencoder (VAE) is a powerful architecture capable of representation learning and generative modeling. When it comes to learning interpretable (disentangled) representations, VAE and its variants show unparalleled performance.
However, the reasons for this are unclear, since a very particular alignment of the latent embedding is needed but the design of the VAE does not encourage it in any explicit way.
We address this matter and offer the following explanation: the diagonal approximation in the encoder together with the inherent stochasticity force local orthogonality of the decoder. The local behavior of promoting both reconstruction and orthogonality matches closely how the PCA embedding is chosen. Alongside providing an intuitive understanding, we justify the statement with full theoretical analysis as well as with experiments.

Audio-driven 3D facial animation has been widely explored, but achieving realistic, human-like performance is still unsolved. This is due to the lack of available 3D datasets, models, and standard evaluation metrics. To address this, we introduce a unique 4D face dataset with about 29 minutes of 4D scans captured at 60 fps and synchronized audio from 12 speakers. We then train a neural network on our dataset that factors identity from facial motion. The learned model, VOCA (Voice Operated Character Animation) takes any speech signal as input—even speech in languages other than English—and realistically animates a wide range of adult faces. Conditioning on subject labels during training allows the model to learn a variety of realistic speaking styles. VOCA also provides animator controls to alter speaking style, identity-dependent facial shape, and pose (i.e. head, jaw, and eyeball rotations) during animation. To our knowledge, VOCA is the only realistic 3D facial animation model that is readily applicable to unseen subjects without retargeting. This makes VOCA suitable for tasks like in-game video, virtual reality avatars, or any scenario in which the speaker, speech, or language is not known in advance. We make the dataset and model available for research purposes at http://voca.is.tue.mpg.de.

Abstracting complex 3D shapes with parsimonious part-based representations has been a long standing goal in computer vision. This paper presents a learning-based solution to this problem which goes beyond the traditional 3D cuboid representation by exploiting superquadrics as atomic elements. We demonstrate that superquadrics lead to more expressive 3D scene parses while being easier to learn than 3D cuboid representations. Moreover, we provide an analytical solution to the Chamfer loss which avoids the need for computational expensive reinforcement learning or iterative prediction. Our model learns to parse 3D objects into consistent superquadric representations without supervision. Results on various ShapeNet categories as well as the SURREAL human body dataset demonstrate the flexibility of our model in capturing fine details and complex poses that could not have been modelled using cuboids.

The Internet of Things (IoT) interconnects multiple physical devices in large-scale networks. When the 'things' coordinate decisions and act collectively on shared information, feedback is introduced between them. Multiple feedback loops are thus closed over a shared, general-purpose network. Traditional feedback control is unsuitable for design of IoT control because it relies on high-rate periodic communication and is ignorant of the shared network resource. Therefore, recent event-based estimation methods are applied herein for resource-aware IoT control allowing agents to decide online whether communication with other agents is needed, or not. While this can reduce network traffic significantly, a severe limitation of typical event-based approaches is the need for instantaneous triggering decisions that leave no time to reallocate freed resources (e.g., communication slots), which hence remain unused. To address this problem, novel predictive and self triggering protocols are proposed herein. From a unified Bayesian decision framework, two schemes are developed: self triggers that predict, at the current triggering instant, the next one; and predictive triggers that check at every time step, whether communication will be needed at a given prediction horizon. The suitability of these triggers for feedback control is demonstrated in hardware experiments on a cart-pole, and scalability is discussed with a multi-vehicle simulation.

Statistical models of the human body surface are generally learned from thousands of high-quality 3D scans in predefined poses to cover the wide variety of human body shapes and articulations. Acquisition of such data requires expensive equipment, calibration procedures, and is limited to cooperative subjects who can understand and follow instructions, such as adults. We present a method for learning a statistical 3D Skinned Multi-Infant Linear body model (SMIL) from incomplete, low-quality RGB-D sequences of freely moving infants. Quantitative experiments show that SMIL faithfully represents the RGB-D data and properly factorizes the shape and pose of the infants. To demonstrate the applicability of SMIL, we fit the model to RGB-D sequences of freely moving infants and show, with a case study, that our method captures enough motion detail for General Movements Assessment (GMA), a method used in clinical practice for early detection of neurodevelopmental disorders in infants. SMIL provides a new tool for analyzing infant shape and movement and is a step towards an automated system for GMA.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 5411-5417, Montreal, Canada, May 2019, Hyosang Lee and Kyungseo Park contributed equally to this publication (inproceedings)

In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems (CHI), Glasgow, Scotland, May 2019 (inproceedings)

Abstract

Creating haptic experiences often entails inventing, modifying, or selecting specialized hardware. However, experience designers are rarely engineers, and 30 years of haptic inventions are buried in a fragmented literature that describes devices mechanically rather than by potential purpose. We conceived of Haptipedia to unlock this trove of examples: Haptipedia presents a device corpus for exploration through metadata that matter to both device and experience designers. It is a taxonomy of device attributes that go beyond physical description to capture potential utility, applied to a growing database of 105 grounded force-feedback devices, and accessed through a public visualization that links utility to morphology. Haptipedia's design was driven by both systematic review of the haptic device literature and rich input from diverse haptic designers. We describe Haptipedia's reception (including hopes it will redefine device reporting standards) and our plans for its sustainability through community participation.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 3804-3810, Montreal, Canada, May 2019 (inproceedings)

Abstract

Humans can form an impression of how a new object feels simply by touching its surfaces with the densely innervated skin of the fingertips. Many haptics researchers have recently been working to endow robots with similar levels of haptic intelligence, but these efforts almost always employ hand-crafted features, which are brittle, and concrete tasks, such as object recognition. We applied unsupervised feature learning methods, specifically K-SVD and Spatio-Temporal Hierarchical Matching Pursuit (ST-HMP), to rich multi-modal haptic data from a diverse dataset. We then tested the learned features on 19 more abstract binary classification tasks that center on haptic adjectives such as smooth and squishy. The learned features proved superior to traditional hand-crafted features by a large margin, almost doubling the average F1 score across all adjectives. Additionally, particular exploratory procedures (EPs) and sensor channels were found to support perception of certain haptic adjectives, underlining the need for diverse interactions and multi-modal haptic data.

In Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, pages: 97-108, 10th ACM/IEEE International Conference on Cyber-Physical Systems, April 2019 (inproceedings)

Abstract

Closing feedback loops fast and over long distances is key to emerging applications; for example, robot motion control and swarm coordination require update intervals below 100 ms. Low-power wireless is preferred for its flexibility, low cost, and small form factor, especially if the devices support multi-hop communication. Thus far, however, closed-loop control over multi-hop low-power wireless has only been demonstrated for update intervals on the order of multiple seconds. This paper presents a wireless embedded system that tames imperfections impairing control performance such as jitter or packet loss, and a control design that exploits the essential properties of this system to provably guarantee closed-loop stability for linear dynamic systems. Using experiments on a testbed with multiple cart-pole systems, we are the first to demonstrate the feasibility and to assess the performance of closed-loop control and coordination over multi-hop low-power wireless for update intervals from 20 ms to 50 ms.

Catalytically active colloids are model systems for chemical motors and active matter. It is desirable to replace the inorganic catalysts and the toxic fuels that are often used, with biocompatible enzymatic reactions. However, compared to inorganic catalysts, enzyme-coated colloids tend to exhibit less activity. Here, we show that the self-assembly of genetically engineered M13 bacteriophages that bind enzymes to magnetic beads ensures high and localized enzymatic activity. These phage-decorated colloids provide a proteinaceous environment for directed enzyme immobilization. The magnetic properties of the colloidal carrier particle permit repeated enzyme recovery from a reaction solution, while the enzymatic activity is retained. Moreover, localizing the phage-based construct with a magnetic field in a microcontainer allows the enzyme-phage-colloids to function as an enzymatic micropump, where the enzymatic reaction generates a fluid flow. This system shows the fastest fluid flow reported to date by a biocompatible enzymatic micropump. In addition, it is functional in complex media including blood where the enzyme driven micropump can be powered at the physiological blood-urea concentration.

We report on an extensive study of the elastic scattering time $τ_\mathrm{s}$ of matter waves in optical disordered potentials. Using direct experimental measurements, numerical simulations, and comparison with the first-order Born approximation based on the knowledge of the disorder properties, we explore the behavior of $τ_\mathrm{s}$ over more than 3 orders of magnitude, ranging from the weak to the strong scattering regime. We study in detail the location of the crossover and, as a main result, we reveal the strong influence of the disorder statistics, especially on the relevance of the widely used Ioffe-Regel-like criterion $k l_\mathrm{s}\sim 1$. While it is found to be relevant for Gaussian-distributed disordered potentials, we observe significant deviations for laser speckle disorders that are commonly used with ultracold atoms. Our results are crucial for connecting experimental investigation of complex transport phenomena, such as Anderson localization, to microscopic theories.

The diffusion of enzymes is of fundamental importance for many biochemical processes. Enhanced or directed enzyme diffusion can alter the accessibility of substrates and the organization of enzymes within cells. Several studies based on fluorescence correlation spectroscopy (FCS) report enhanced diffusion of enzymes upon interaction with their substrate or inhibitor. In this context, major importance is given to the enzyme fructose-bisphosphate aldolase, for which enhanced diffusion has been reported even though the catalysed reaction is endothermic. Additionally, enhanced diffusion of tracer particles surrounding the active aldolase enzymes has been reported. These studies suggest that active enzymes can act as chemical motors that self-propel and give rise to enhanced diffusion. However, fluorescence studies of enzymes can, despite several advantages, suffer from artefacts. Here we show that the absolute diffusion coefficients of active enzyme solutions can be determined with Pulsed Field Gradient Nuclear Magnetic Resonance (PFG-NMR). The advantage of PFG-NMR is that the motion of the molecule of interest is directly observed in its native state without the need for any labelling. Further, PFG-NMR is model-free and thus yields absolute diffusion constants. Our PFG-NMR experiments of solutions containing active fructose-bisphosphate aldolase from rabbit muscle do not show any diffusion enhancement for the active enzymes nor the surrounding molecules. Additionally, we do not observe any diffusion enhancement of aldolase in the presence of its inhibitor pyrophosphate.

In Proceedings of International Workshop on Haptic and Audio Interaction Design (HAID), Lille, France, March 2019 (inproceedings)

Abstract

Generating realistic texture feelings on tactile displays using data-driven methods has attracted a lot of interest in the last decade. However, the need for large data storages and transmission rates complicates the use of these methods for the future commercial displays. In this paper, we propose a new texture rendering approach which can compress the texture data signicantly for electrostatic
displays. Using three sample surfaces, we first explain how to record, analyze and compress the texture data, and render them on a touchscreen. Then, through psychophysical experiments conducted with nineteen participants, we show that the textures can be reproduced by a signicantly less number of frequency components than the ones in the original signal without inducing perceptual degradation. Moreover, our results indicate that the possible degree of compression is affected by the surface properties.

The rheological properties of a colloidal suspension are a function of the concentration of the colloids and their interactions. While suspensions of passive colloids are well studied and have been shown to form crystals, gels, and glasses, examples of energy‐consuming “active” colloidal suspensions are still largely unexplored. Active suspensions of biological matter, such as motile bacteria or dense mixtures of active actin–motor–protein mixtures have, respectively, reveals superfluid‐like and gel‐like states. Attractive inanimate systems for active matter are chemically self‐propelled particles. It has so far been challenging to use these swimming particles at high enough densities to affect the bulk material properties of the suspension. Here, it is shown that light‐triggered asymmetric titanium dioxide that self‐propel, can be obtained in large quantities, and self‐organize to make a gram‐scale active medium. The suspension shows an activity‐dependent tenfold reversible change in its bulk viscosity.

The individual shape of the human body, including the geometry of its articulated structure and the distribution of weight over that structure, influences the kinematics of a person’s movements. How sensitive is the visual system to inconsistencies between shape and motion introduced by retargeting motion from one person onto the shape of another? We used optical motion capture to record five pairs of male performers with large differences in body weight, while they pushed, lifted, and threw objects. From these data, we estimated both the kinematics of the actions as well as the performer’s individual body shape. To obtain consistent and inconsistent stimuli, we created animated avatars by combining the shape and motion estimates from either a single performer or from different performers. Using these stimuli we conducted three experiments in an immersive virtual reality environment. First, a group of participants detected which of two stimuli was inconsistent. Performance was very low, and results were only marginally significant. Next, a second group of participants rated perceived attractiveness, eeriness, and humanness of consistent and inconsistent stimuli, but these judgements of animation characteristics were not affected by consistency of the stimuli. Finally, a third group of participants rated properties of the objects rather than of the performers. Here, we found strong influences of shape-motion inconsistency on perceived weight and thrown distance of objects. This suggests that the visual system relies on its knowledge of shape and motion and that these components are assimilated into an altered perception of the action outcome. We propose that the visual system attempts to resist inconsistent interpretations of human animations. Actions involving object manipulations present an opportunity for the visual system to reinterpret the introduced inconsistencies as a change in the dynamics of an object rather than as an unexpected combination of body shape and body motion.

Chiral nano- or metamaterials and surfaces enable striking photonic properties, such as negative refractive index and superchiral light, driving promising applications in novel optical components, nanorobotics, and enhanced chiral molecular interactions with light. In characterizing chirality, although nonlinear chiroptical techniques are typically much more sensitive than their linear optical counterparts, separating true chirality from anisotropy is a major challenge. Here, we report the first observation of optical activity in second-harmonic hyper-Rayleigh scattering (HRS). We demonstrate the effect in a 3D isotropic suspension of Ag nanohelices in water. The effect is 5 orders of magnitude stronger than linear optical activity and is well pronounced above the multiphoton luminescence background. Because of its sensitivity, isotropic environment, and straightforward experimental geometry, HRS optical activity constitutes a fundamental experimental breakthrough in chiral photonics for media including nanomaterials, metamaterials, and chemical molecules.

Bayesian optimization is proposed for automatic
learning of optimal controller parameters from experimental
data. A probabilistic description (a Gaussian process) is used
to model the unknown function from controller parameters to
a user-defined cost. The probabilistic model is updated with
data, which is obtained by testing a set of parameters on the
physical system and evaluating the cost. In order to learn fast,
the Bayesian optimization algorithm selects the next parameters
to evaluate in a systematic way, for example, by maximizing
information gain about the optimum. The algorithm thus iteratively
finds the globally optimal parameters with only few
experiments. Taking throttle valve control as a representative
industrial control example, the proposed auto-tuning method is
shown to outperform manual calibration: it consistently achieves
better performance with a low number of experiments. The
proposed auto-tuning framework is flexible and can handle
different control structures and objectives.

The split-attention effect refers to learning with related representations in multimedia. Spatial proximity and integration of these representations are crucial for learning processes. The influence of varying amounts of proximity between related and unrelated information has not yet been specified. In two experiments (N1 = 98; N2 = 85), spatial proximity between a pictorial presentation and text labels was manipulated (high vs. medium vs. low). Additionally, in experiment 1, a control group with separated picture and text presentation was implemented. The results revealed a significant effect of spatial proximity on learning performance. In contrast to previous studies, the medium condition leads to the highest transfer, and in experiment 2, the highest retention score. These results are interpreted considering cognitive load and instructional efficiency. Findings indicate that transfer efficiency is optimal at a medium distance between representations in experiment 1. Implications regarding the spatial contiguity principle and the spatial contiguity failure are discussed.

High-speed and high-acceleration movements are inherently hard to control. Applying learning to the control of such motions on anthropomorphic robot arms can improve the accuracy of the control but might damage the system. The inherent exploration of learning approaches can lead to instabilities and the robot reaching joint limits at high speeds. Having hardware that enables safe exploration of high-speed and high-acceleration movements is therefore desirable. To address this issue, we propose to use robots actuated by Pneumatic Artificial Muscles (PAMs). In this paper, we present a four degrees of freedom (DoFs) robot arm that reaches high joint angle accelerations of up to 28000 °/s^2 while avoiding dangerous joint limits thanks to the antagonistic actuation and limits on the air pressure ranges. With this robot arm, we are able to tune control parameters using Bayesian optimization directly on the hardware without additional safety considerations. The achieved tracking performance on a fast trajectory exceeds previous results on comparable PAM-driven robots. We also show that our system can be controlled well on slow trajectories with PID controllers due to careful construction considerations such as minimal bending of cables, lightweight kinematics and minimal contact between PAMs and PAMs with the links. Finally, we propose a novel technique to control the the co-contraction of antagonistic muscle pairs. Experimental results illustrate that choosing the optimal co-contraction level is vital to reach better tracking performance. Through the use of PAM-driven robots and learning, we do a small step towards the future development of robots capable of more human-like motions.

Voluntary behavior of humans appears to be composed of small, elementary building blocks or behavioral primitives. While this modular organization seems crucial for the learning of complex motor skills and the flexible adaption of behavior to new circumstances, the problem of learning meaningful, compositional abstractions from sensorimotor experiences remains an open challenge. Here, we introduce a computational learning architecture, termed surprise-based behavioral modularization into event-predictive structures (SUBMODES), that explores behavior and identifies the underlying behavioral units completely from scratch. The SUBMODES architecture bootstraps sensorimotor exploration using a self-organizing neural controller. While exploring the behavioral capabilities of its own body, the system learns modular structures that predict the sensorimotor dynamics and generate the associated behavior. In line with recent theories of event perception, the system uses unexpected prediction error signals, i.e., surprise, to detect transitions between successive behavioral primitives. We show that, when applied to two robotic systems with completely different body kinematics, the system manages to learn a variety of complex behavioral primitives. Moreover, after initial self-exploration the system can use its learned predictive models progressively more effectively for invoking model predictive planning and goal-directed control in different tasks and environments.

Magnonic crystals are systems that can be used to design and tune the dynamic properties of magnetization. Here, we focus on one-dimensional Fibonacci magnonic quasicrystals. We confirm the existence of collective spin waves propagating through the structure as well as dispersionless modes; the reprogammability of the resonance frequencies, dependent on the magnetization order; and dynamic spin-wave interactions. With the fundamental understanding of these properties, we lay a foundation for the scalable and advanced design of spin-wave band structures for spintronic, microwave, and magnonic applications.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems