2015

For grasping and manipulation with robot arms, knowing the current pose of the arm is crucial
for successful controlling its motion. Often, pose estimations can be acquired from encoders
inside the arm, but they can have significant inaccuracy which makes the use of additional
techniques necessary.
In this master thesis, a novel approach of robot arm pose estimation is presented, that works on
single depth images without the need of prior foreground segmentation or other preprocessing
steps.
A random regression forest is used, which is trained only on synthetically generated data.
The approach improves former work by Bohg et al. by considerably reducing the computational
effort both at training and test time. The forest in the new method directly estimates the
desired joint angles while in the former approach, the forest casts 3D position votes for the
joints, which then have to be clustered and fed into an iterative inverse kinematic process to
finally get the joint angles.
To improve the estimation accuracy, the standard training objective of the forest training is
replaced by a specialized function that makes use of a model-dependent distance metric, called
DISP.
Experimental results show that the specialized objective indeed improves pose estimation and
it is shown that the method, despite of being trained on synthetic data only, is able to
provide reasonable estimations for real data at test time.

Detecting and identifying the different objects in an image fast and reliably is an
important skill for interacting with one’s environment. The main problem is that in
theory, all parts of an image have to be searched for objects on many different scales
to make sure that no object instance is missed. It however takes considerable time
and effort to actually classify the content of a given image region and both time
and computational capacities that an agent can spend on classification are limited.
Humans use a process called visual attention to quickly decide which locations of
an image need to be processed in detail and which can be ignored. This allows us
to deal with the huge amount of visual information and to employ the capacities
of our visual system efficiently.
For computer vision, researchers have to deal with exactly the same problems,
so learning from the behaviour of humans provides a promising way to improve
existing algorithms. In the presented master’s thesis, a model is trained with eye
tracking data recorded from 15 participants that were asked to search images for
objects from three different categories. It uses a deep convolutional neural network
to extract features from the input image that are then combined to form a saliency
map. This map provides information about which image regions are interesting
when searching for the given target object and can thus be used to reduce the
parts of the image that have to be processed in detail. The method is based on a
recent publication of Kümmerer et al., but in contrast to the original method that
computes general, task independent saliency, the presented model is supposed to
respond differently when searching for different target categories.

Current robotics research is largely driven by the vision of creating an intelligent being that can perform dangerous, difficult or unpopular tasks. These can for example be exploring the surface of planet mars or the bottom of the ocean, maintaining a furnace or assembling a car. They can also be more mundane such as cleaning an apartment or fetching groceries. This vision has been pursued since the 1960s when the first robots were built. Some of the tasks mentioned above, especially those in industrial manufacturing, are already frequently performed by robots. Others are still completely out of reach. Especially, household robots are far away from being deployable as general purpose devices. Although advancements have been made in this research area, robots are not yet able to perform household chores robustly in unstructured and open-ended environments given unexpected events and uncertainty in perception and execution.In this thesis, we are analyzing which perceptual and motor capabilities are necessary for the robot to perform common tasks in a household scenario. In that context, an essential capability is to understand the scene that the robot has to interact with. This involves separating objects from the background but also from each other.Once this is achieved, many other tasks become much easier. Configuration of object scan be determined; they can be identified or categorized; their pose can be estimated; free and occupied space in the environment can be outlined.This kind of scene model can then inform grasp planning algorithms to finally pick up objects.However, scene understanding is not a trivial problem and even state-of-the-art methods may fail. Given an incomplete, noisy and potentially erroneously segmented scene model, the questions remain how suitable grasps can be planned and how they can be executed robustly.In this thesis, we propose to equip the robot with a set of prediction mechanisms that allow it to hypothesize about parts of the scene it has not yet observed. Additionally, the robot can also quantify how uncertain it is about this prediction allowing it to plan actions for exploring the scene at specifically uncertain places. We consider multiple modalities including monocular and stereo vision, haptic sensing and information obtained through a human-robot dialog system. We also study several scene representations of different complexity and their applicability to a grasping scenario. Given an improved scene model from this multi-modal exploration, grasps can be inferred for each object hypothesis. Dependent on whether the objects are known, familiar or unknown, different methodologies for grasp inference apply. In this thesis, we propose novel methods for each of these cases. Furthermore,we demonstrate the execution of these grasp both in a closed and open-loop manner showing the effectiveness of the proposed methods in real-world scenarios.

2007

Autonomous robots that can assist humans in situations of daily life have been a long standing vision of robotics, artificial intelligence, and cognitive sciences. A first step towards this goal is to create robots that can accomplish a multitude of different tasks, triggered by environmental context or higher level instruction. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s, however, made it clear that an approach purely based on reasoning and human insights would not be able to model all the perceptuomotor tasks that a robot should fulfill. Instead, new hope was put in the growing wake of machine learning that promised fully adaptive control algorithms which learn both by observation and trial-and-error.
However, to date, learning techniques have yet to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator robotics, or even the new upcoming trend of humanoid robotics, and usually scaling was only achieved in precisely pre-structured domains.
In this thesis, we investigate the ingredients for a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can be applied in this setting.
As a theoretical foundation, we first study a general framework to generate control laws for real robots with a particular focus on skills represented as dynamical systems in differential constraint form. We present a point-wise optimal control framework resulting from a generalization of Gauss' principle and show how various well-known robot control laws can be derived by modifying the metric of the employed cost function. The framework has been successfully applied to task space tracking control for holonomic systems for several different metrics on the anthropomorphic SARCOS Master Arm.
In order to overcome the limiting requirement of accurate robot models, we first employ learning methods to find learning controllers for task space control.
However, when learning to execute a redundant control problem, we face the general problem of the non-convexity of the solution space which can force the robot to steer into physically impossible configurations if supervised learning methods are employed without further consideration. This problem can be resolved using two major insights, i.e., the learning problem can be treated as locally convex and the cost function of the analytical framework can be used to ensure global consistency. Thus, we derive an immediate reinforcement learning algorithm from the expectation-maximization point of view which leads to a reward-weighted regression technique. This method can be used both for operational space control as well as general immediate reward reinforcement learning problems. We demonstrate the feasibility of the resulting framework on the problem of redundant end-effector tracking for both a simulated 3 degrees of freedom robot arm as well as for a simulated anthropomorphic SARCOS Master Arm.
While learning to execute tasks in task space is an essential component to a general framework to motor skill learning, learning the actual task is of even higher importance, particularly as this issue is more frequently beyond the abilities of analytical approaches than execution. We focus on the learning of elemental tasks which can serve as the "building blocks of movement
generation", called motor primitives. Motor primitives are parameterized task representations based on splines or nonlinear differential equations with desired attractor properties. While imitation learning of parameterized motor primitives is a relatively well-understood problem, the self-improvement by interaction of the system with the environment remains a challenging problem, tackled in the fourth chapter of this thesis.
For pursuing this goal, we highlight the difficulties with current
reinforcement learning methods, and outline both established and novel
algorithms for the gradient-based improvement of parameterized
policies. We compare these algorithms in the context of motor
primitive learning, and show that our most modern algorithm, the
Episodic Natural Actor-Critic outperforms previous algorithms by at
least an order of magnitude. We demonstrate the efficiency of this
reinforcement learning method in the application of learning to hit a
baseball with an anthropomorphic robot arm.
In conclusion, in this thesis, we have contributed a general framework for analytically computing robot control laws which can be used for deriving various previous control approaches and serves as foundation as well as inspiration for our learning algorithms. We have introduced two classes of novel reinforcement learning methods, i.e., the Natural Actor-Critic and the Reward-Weighted Regression algorithm. These algorithms have been used in order to replace the analytical components of the theoretical framework by learned representations. Evaluations have been performed on both simulated and real robot arms.

2004

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems