Over the past few years, Reinforcement learning (RL) research has seen a number of significant advances and the type of progress it has made turn out to be very important, as the algorithms that yield these advances are additionally applicable for other domains, such as in robotics.

Very often, developing these kinds of advances requires the iteration over a design quickly, and it is done so often with no clear direction — and disrupting the structure of established methods. However, most existing RL frameworks do not provide the combination of flexibility as well as stability that effectively enables researchers to iterate on RL methods, and thus explore new research directions that may not have obvious benefits immediately.

Further, from existing frameworks reproducing the results is often time-consuming, which can lead to scientific reproducibility issues down the line.
Therefore, you are presented to a new Tensorflow-based framework: Dopamine, A Research Framework For Fast Prototyping Of Reinforcement Learning Algorithms that aims to provide flexibility, stability, and reproducibility for new as well as experienced RL researchers alike.

What are the principles that Dopamine is based on?

Dopamine, having had its inspiration from one of the main components in reward-motivated behaviour in the brain and reflecting the strong historical connection between neuroscience and reinforcement learning research, aims to enable the kind of speculative research that can drive radical discoveries.

And in order to do so, the framework was designed keeping the below into consideration:

1- Ease Of Use:

The two key considerations in the design of this framework are Clarity and simplicity. The code that is provided is compact as well as well-documented. This is achieved by focusing on a mature, well-understood benchmark: Arcade Learning Environment and four value-based agents:

DQN,

C51,

A simplified carefully curated variant of the Rainbow agent,

The Implicit Quantile Network agent, which was presented at the International Conference on Machine Learning (ICML).

2- Reproducibility:

The team is particularly sensitive to the importance of reproducibility in reinforcement learning research. To this end, their code is provided with full test coverage that serves as an additional form of documentation.

3- Benchmarking:

It is very important for new researchers to be able to benchmark their ideas against established methods quickly. As such, the team provides the full training data of the four provided agents, across the 60 games supported by the Arcade Learning Environment, available as Python pickle files for agents trained with their framework and as JSON data files for comparison with agents trained in other frameworks.

In addition to this, they also provide a website where the user can visualize the training runs for all provided agents on all 60 games quickly.

Conclusion:

Since we now know, what Dopamine is and what it relies on, to be more specific and to provide a clear understanding, the design principles of the framework can be put as:

Easy experimentation: Make it easier for new users to run benchmark experiments.

Flexible development: Make it easier for new users to try out research ideas.

Compact and reliable: Provision of implementations for a few, battle-tested algorithms.

Reproducible: Facilitate reproducibility in results.

Currently, the team is actively making the use of it for their research and finding it is giving them the flexibility to iterate over many ideas quickly.

And with such a design, it is hoped that the framework’s flexibility and ease-of-use will empower researchers to try out new ideas and it will also be exciting to see what the larger community can make of it!