Open source

Reinforcement Learning for Torch: Introducing torch-twrl

Introducing torch-twrl

Advances in machine learning have been driven by innovations and ideas from many fields. Inspired by the way that humans learn, Reinforcement Learning (RL) is concerned with algorithms which improve with trial-and-error feedback to optimize future performance.

Board games and video games often have well-defined reward functions which allow for straightforward optimization with RL algorithms. Algorithmic advances have allowed for RL to be in real-world problems, such as high degree-of-freedom robotic manipulation and large-scale recommendation tasks, with more complex goals.

Twitter Cortex invests in novel state-of-the-art machine learning methods to improve the quality of our products. We are exploring RL as a learning paradigm, and to that end, Twitter Cortex built a framework for RL development. Today, Twitter is open sourcing torch-twrl to the world.

RL algorithms (or agents) aim to learn to perform complex, novel tasks through interaction with the task (or environment). To develop effective algorithms, rapid iteration and testing is important, torch-twrl aims to make implementing and innovating fast and easy.

Gym provides an extensive collection of RL environments; torch-twrl interacts with these environments through the HTTP API. torch-twrl provides a simple and modular way for developers to start working with RL within their existing Torch / Lua code.

torch-twrl makes developing and testing RL algorithms and environments easy. Here is a working example to take you through solving a classic RL control problem. To give you a feel for how easy it is, we have included a convenient script to run a basic policy gradient [Williams, 1992] agent for the classic RL Cart Pole task [Barto et al., 1983]:

To run an experiment, set your environment and agent experimental parameters. Agents require a policy, a model, and a learning update with relevant parameters.

This Tweet is unavailable

This Tweet is unavailable.

The results above are from the OpenAI Gym Leaderboard. When you run an algorithm using torch-twrl there is an option to automatically upload your results to the Leaderboard which automatically creates a nice results plot, and builds a short GIF of your results.

This Tweet is unavailable

This Tweet is unavailable.

The leaderboard is also valuable for comparing your results with other implementations.

This Tweet is unavailable

This Tweet is unavailable.

The basic RL framework has an agent interacting with an environment. The agent is composed of:

model: agents model which maps states to actions;

policy: how are the actions selected; and

learning update: how does the model update with rewards received.

Note:Many other parameters can be set which may be specific to policies, learning updates, models, and monitoring and are thus more completely described in the documentation.

We are hoping torch-twrl continues to grow as an RL framework, similar to RLLab [Duan et al. 2016], for developers working in Torch and Lua. RL research is an active field and includes a wide variety of environments and implementations of state-of-the-art algorithms; we plan to grow our library of new RL algorithms.

While there are other good RL frameworks based on torch [rl-torch], [Kaixhin/Atari], we wanted a framework built from scratch with minimal external dependencies so that it is easily compatible with our internal stack at Twitter. To help you get started, we include a minimal random agent, a Policy Gradient agent based on REINFORCE [Williams 1992], and TD(Lambda) with SARSA and Q-Learning [Sutton and Barto, 1998]. If you want to contribute, we accept pull-requests and issues at torch-twrl on GitHub.

Enjoy torch-twrl. We are excited for your feedback and future development.