In many reinforcement learning tasks, the goal is to learn a policy to
manipulate an agent, whose design is fixed, to maximize some notion of
cumulative reward. The design of the agent's physical structure is rarely
optimized for the task at hand. In this work, we explore the possibility of
learning a version of the agent's design that is better suited for its task,
jointly with the policy. We propose a minor alteration to the OpenAI Gym
framework, where we parameterize parts of an environment, and allow an agent to
jointly learn to modify these environment parameters along with its policy. We
demonstrate that an agent can learn a better structure of its body that is not
only better suited for the task, but also facilitates policy learning. Joint
learning of policy and structure may even uncover design principles that are
useful for assisted-design applications. Videos of results at
https://designrl.github.io/

Captured tweets and retweets: 2

Made with a human heart + one part enriched uranium + four parts unicorn blood