Workshop on the Algorithmic Foundations of Robotics, pages: 22, November 2016 (conference)

Abstract

Stochastic Optimal Control (SOC) typically considers noise only in the process model, i.e. unknown disturbances. However, in many robotic applications that involve interaction with the environment, such as locomotion and manipulation, uncertainty also comes from lack of pre- cise knowledge of the world, which is not an actual disturbance. We de- velop a computationally efficient SOC algorithm, based on risk-sensitive control, that takes into account uncertainty in the measurements. We include the dynamics of an observer in such a way that the control law explicitly depends on the current measurement uncertainty. We show that high measurement uncertainty leads to low impedance behaviors, a result in contrast with the effects of process noise variance that creates stiff behaviors. Simulation results on a simple 2D manipulator show that our controller can create better interaction with the environment under uncertain contact locations than traditional SOC approaches.

Many highly dynamic novel mobile robots have been developed being inspired by animals. In this study, we are inspired by a basilisk lizard's ability to run and steer on water surface for a hexapedal robot. The robot has an active tail with a circular plate, which the robot rotates to steer on water. We dynamically modeled the platform and conducted simulations and experiments on steering locomotion with a bang-bang controller. The robot can steer on water by rotating the tail, and the controlled steering locomotion is stable. The dynamic modelling approximates the robot's steering locomotion and the trends of the simulations and experiments are similar, although there are errors between the desired and actual angles. The robot's maneuverability on water can be improved through further research.

Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems, IEEE, IROS, October 2016 (conference)

Abstract

One of the central tasks for a household robot is searching for specific objects. It does not only require localizing the target object but also identifying promising search locations in the scene if the target is not immediately visible. As computation time and hardware resources are usually limited in robotics, it is desirable to avoid expensive visual processing steps that are exhaustively applied over the entire image. The human visual system can quickly select those image locations that have to be processed in detail for a given task. This allows us to cope with huge amounts of information and to efficiently deploy the limited capacities of our visual system. In this paper, we therefore propose to use human fixation data to train a top-down saliency model that predicts relevant image locations when searching for specific objects. We show that the learned model can successfully prune bounding box proposals without rejecting the ground truth object locations. In this aspect, the proposed model outperforms a model that is trained only on the ground truth segmentations of the target object instead of fixation data.

In International Conference on Intelligent Robots and Systems (IROS) 2016, IEEE/RSJ International Conference on Intelligent Robots and Systems, October 2016 (inproceedings)

Abstract

Binary descriptors allow fast detection and matching algorithms in computer vision problems. Though binary descriptors can be computed at almost two orders of magnitude faster than traditional gradient based descriptors, they suffer from poor matching accuracy in challenging conditions. In this paper we propose three improvements for binary descriptors in their computation and matching that enhance their performance in comparison to traditional binary and non-binary descriptors without compromising their speed. This is achieved by learning some weights and threshold parameters that allow customized matching under some variations such as lighting and viewpoint. Our suggested improvements can be easily applied to any binary descriptor. We demonstrate our approach on the ORB (Oriented FAST and Rotated BRIEF) descriptor and compare its performance with the traditional ORB and SIFT descriptors on a wide variety of datasets. In all instances, our enhancements outperform standard ORB and is comparable to SIFT.

We describe the first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image. We estimate a full 3D mesh and show that 2D joints alone carry a surprising amount of information about body shape. The problem is challenging because of the complexity of the human body, articulation, occlusion, clothing, lighting, and the inherent ambiguity in inferring 3D from 2D. To solve this, we first use a recently published CNN-based method, DeepCut, to predict (bottom-up) the 2D body joint locations. We then fit (top-down) a recently published statistical body shape model, called SMPL, to the 2D joints. We do so by minimizing an objective function that penalizes the error between the projected 3D model joints and detected 2D joints. Because SMPL captures correlations in human shape across the population, we are able to robustly fit it to very little data. We further leverage the 3D model to prevent solutions that cause interpenetration. We evaluate our method, SMPLify, on the Leeds Sports, HumanEva, and Human3.6M datasets, showing superior
pose accuracy with respect to the state of the art.

In European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, Springer, 14th European Conference on Computer Vision, October 2016 (inproceedings)

Abstract

In this paper we propose a CNN architecture for semantic image segmentation. We introduce a new “bilateral inception” module that can be inserted in existing CNN architectures and performs bilateral filtering, at multiple feature-scales, between superpixels in an image. The feature spaces for bilateral filtering and other parameters of the module are learned end-to-end using standard backpropagation techniques. The bilateral inception module addresses two issues that arise with general CNN segmentation architectures. First, this module propagates information between (super) pixels while respecting image edges, thus using the structured information of the problem for improved results. Second, the layer recovers a full resolution segmentation result from the lower resolution solution of a CNN. In the experiments, we modify several existing CNN architectures by inserting our inception modules between the last CNN (1 × 1 convolution) layers. Empirical results on three different datasets show reliable improvements not only in comparison to the baseline networks, but also in comparison to several dense-pixel prediction techniques such as CRFs, while being competitive in time.

The caffe framework is one of the leading deep learning toolboxes in the machine learning and computer vision community. While it offers efficiency and configurability, it falls short of a full interface to Python. With increasingly involved procedures for training deep networks and reaching depths of hundreds of layers, creating configuration files and keeping them consistent becomes an error prone process.
We introduce the barrista framework, offering full, pythonic control over caffe. It separates responsibilities and offers code to solve frequently occurring tasks for pre-processing, training and model inspection. It is compatible to all caffe versions since mid 2015 and can import and export .prototxt files.
Examples are included, e.g., a deep residual network implemented in only 172 lines (for arbitrary depths), comparing to 2320 lines in the official implementation for the equivalent model.

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.

Sperm-shaped microrobots are controlled under the influence of weak oscillating magnetic fields (milliTesla range) to selectively target cell mockups (i.e., gas bubbles with average diameter of 200 μm). The sperm-shaped microrobots are fabricated by electrospinning using a solution of polystyrene, dimethylformamide, and iron oxide nanoparticles. These nanoparticles are concentrated within the head of the microrobot, and hence enable directional control along external magnetic fields. The magnetic dipole moment of the microrobot is characterized (using the flip-time technique) to be 1.4×10-11 A.m2, at magnetic field of 28 mT. In addition, the morphology of the microrobot is characterized using Scanning Electron Microscopy images. The characterized parameters and morphology are used in the simulation of the locomotion mechanism of the microrobot to prove that its motion depends on breaking the time-reversal symmetry, rather than pulling with the magnetic field gradient. We experimentally demonstrate that the microrobot can controllably follow S-shaped, U-shaped, and square paths, and selectively target the cell mockups using image guidance and under the influence of the oscillating magnetic fields.

In 2016 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), pages: 1-5, July 2016 (inproceedings)

Abstract

One of the main challenges in the development of microrobots, i.e. robots at the sub-millimeter scale, is the difficulty of adopting traditional solutions for power, control and, especially, actuation. As a result, most current microrobots are directly manipulated by external fields, and possess only a few passive degrees of freedom (DOFs). We have reported a strategy that enables embodiment, remote powering and control of a large number of DOFs in mobile soft microrobots. These consist of photo-responsive materials, such that the actuation of their soft continuous body can be selectively and dynamically controlled by structured light fields. Here we use finite-element modelling to evaluate the effective number of DOFs that are addressable in our microrobots. We also demonstrate that by this flexible approach different actuation patterns can be obtained, and thus different locomotion performances can be achieved within the very same microrobot. The reported results confirm the versatility of the proposed approach, which allows for easy application-specific optimization and online reconfiguration of the microrobot's behavior. Such versatility will enable advanced applications of robotics and automation at the micro scale.

In this paper, we present the analysis of the torque transmitted to a tilted permanent magnet that is to be embedded in a capsule robot to achieve targeted drug delivery. This analysis is carried out by using an analytical model and experimental results for a small cubic permanent magnet that is driven by an external magnetic system made of an array of arc-shaped permanent magnets (ASMs). Our experimental results, which are in agreement with the analytical results, show that the cubic permanent magnet can safely be actuated for inclinations lower than 75° without having to make positional adjustments in the external magnetic system. We have found that with further inclinations, the cubic permanent magnet to be embedded in a drug delivery mechanism may stall. When it stalls, the external magnetic system's position and orientation would have to be adjusted to actuate the cubic permanent magnet and the drug release mechanism. This analysis of the transmitted torque is helpful for the development of real-time control strategies for magnetically articulated devices.

In 2016 International Conference on Manipulation, Automation and Robotics at Small Scales (MARSS), pages: 1-5, July 2016 (inproceedings)

Abstract

Miniaturized actuators are a key element for the manipulation and automation at small scales. Here, we propose a new miniaturized actuator, which consists of an array of micro gas bubbles immersed in a fluid. Under ultrasonic excitation, the oscillation of micro gas bubbles results in acoustic streaming and provides a propulsive force that drives the actuator. The actuator was fabricated by lithography and fluidic streaming was observed under ultrasound excitation. Theoretical modelling and numerical simulations were carried out to show that lowing the surface tension results in a larger amplitude of the bubble oscillation, and thus leads to a higher propulsive force. Experimental results also demonstrate that the propulsive force increases 3.5 times when the surface tension is lowered by adding a surfactant. An actuator with a 4×4 mm 2 surface area provides a driving force of about 0.46 mN, suggesting that it is possible to be used as a wireless actuator for small-scale robots and medical instruments.

In Proceedings of the American Control Conference (ACC), Boston, MA, USA, July 2016 (inproceedings)

Abstract

Most widely-used state estimation algorithms, such as the Extended Kalman Filter and the Unscented Kalman Filter, belong to the family of Gaussian Filters (GF). Unfortunately, GFs fail if the measurement process is modelled by a fat-tailed distribution. This is a severe limitation, because thin-tailed measurement models, such as the analytically-convenient and therefore widely-used Gaussian distribution, are sensitive to outliers. In this paper, we show that mapping the measurements into a specific feature space enables any existing GF algorithm to work with fat-tailed measurement models. We find a feature function which is optimal under certain conditions. Simulation results show that the proposed method allows for robust filtering in both linear and nonlinear systems with measurements contaminated by fat-tailed noise.

This paper considers the task of articulated human pose estimation of multiple people in real-world images. We propose an approach that jointly solves the tasks of detection and pose estimation: it infers the number of persons in a scene, identifies occluded body parts, and disambiguates body parts between people in close proximity of each other.
This joint formulation is in contrast to previous strategies, that address the problem by first detecting people and subsequently estimating their body pose. We propose a partitioning and labeling formulation of a set of body-part hypotheses generated with CNN-based part detectors. Our formulation, an instance of an integer linear program, implicitly performs non-maximum suppression on the set of part candidates and groups them to form configurations of body parts respecting geometric and appearance constraints. Experiments on four different datasets demonstrate state-of-the-art results for both single person and multi person pose estimation.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract

Video object segmentation is challenging due to fast moving objects, deforming shapes, and cluttered backgrounds. Optical flow can be used to propagate an object segmentation over time but, unfortunately, flow is often inaccurate, particularly around object boundaries. Such boundaries are precisely where we want our segmentation to be accurate. To obtain accurate segmentation across time, we propose an efficient algorithm that considers video segmentation and optical flow estimation simultaneously. For video segmentation, we formulate a principled, multiscale, spatio-temporal objective function that uses optical flow to propagate information between frames. For optical flow estimation, particularly at object boundaries, we compute the flow independently in the segmented regions and recompose the results. We call the process object flow and demonstrate the effectiveness of jointly optimizing optical flow and video segmentation using an iterative scheme. Experiments on the SegTrack v2 and Youtube-Objects datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), June 2016 (inproceedings)

Abstract

In this paper, we propose a non-local structured prior for volumetric multi-view 3D reconstruction. Towards this goal, we present a novel Markov random field model based on ray potentials in which assumptions about large 3D surface patches such as planarity or Manhattan world constraints can be efficiently encoded as probabilistic priors. We further derive an inference algorithm that reasons jointly about voxels, pixels and image segments, and estimates marginal distributions of appearance, occupancy, depth, normals and planarity. Key to tractable inference is a novel hybrid representation that spans both voxel and pixel space and that integrates non-local information from 2D image segmentations in a principled way. We compare our non-local prior to commonly employed local smoothness assumptions and a variety of state-of-the-art volumetric reconstruction baselines on challenging outdoor scenes with textureless and reflective surfaces. Our experiments indicate that regularizing over larger distances has the potential to resolve ambiguities where local regularizers fail.

Existing optical flow methods make generic, spatially homogeneous, assumptions about the spatial structure of the flow. In reality, optical flow varies across an image depending on object class.
Simply put, different objects move differently. Here we exploit recent advances in static semantic scene segmentation to segment the image into objects of different types. We define different models of image motion in these regions depending on the type of object. For example, we model the motion on roads with homographies, vegetation with spatially smooth flow, and independently moving objects like cars and planes with affine motion plus deviations. We then pose the flow estimation problem using a novel formulation of localized layers, which addresses limitations of traditional layered models for dealing with complex scene motion. Our semantic flow method achieves the lowest error of any published monocular method in the KITTI-2015 flow benchmark and produces qualitatively better flow and segmentation than recent top methods on a wide range of natural videos.

Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems