Policy search methods based on reinforcement learning
and optimal control can allow robots to automatically learn
a wide range of tasks. However, practical applications of policy
search tend to require the policy to be supported by hand-engineered
components for perception, state estimation, and low-level
control. We propose a method for learning policies that map
raw, low-level observations, consisting of joint angles and camera
images, directly to the torques at the robot’s joints. The policies
are represented as deep convolutional neural networks (CNNs)
with 92,000 parameters. The high dimensionality of such policies
poses a tremendous challenge for policy search. To address
this challenge, we develop a sensorimotor guided policy search
method that can handle high-dimensional policies and partially
observed tasks. We use BADMM to decompose policy search into
an optimal control phase and supervised learning phase, allowing
CNN policies to be trained with standard supervised learning
techniques. This method can learn a number of manipulation
tasks that require close coordination between vision and control,
including inserting a block into a shape sorting cube, screwing
on a bottle cap, fitting the claw of a toy hammer under a nail
with various grasps, and placing a coat hanger on a clothes rack.