MuJoCo Environments

Overview

MuJoCo (Multi-Joint dynamics with Contact) is a proprietary physics engine for detailed, efficient rigid body simulations with contacts. MuJoCo can be used to create environments with continuous control tasks such as walking or running. Thus, many policy gradient methods (TRPO, PPO) have been tested on various MuJoCo environments.

Environments

InvertedPendulum

This is a MuJoCo version of CartPole. The agent’s goal is to balance a pole on a cart.

InvertedDoublePendulum

This is a harder version of InvertedPendulum, where the pole has another pole on top of it. The agent’s goal is to balance a pole on a pole on a cart.

Reacher

Make a 2D robot reach to a randomly located target.

Hopper

Make a two-dimensional one-legged robot hop forward as fast as possible.

Swimmer

Make a 2D robot swim.

Walker2d

Make a two-dimensional bipedal robot walk forward as fast as possible.

Ant

Make a four-legged creature walk forward as fast as possible.

HalfCheetah

Make a 2D cheetah robot run.

Humanoid

Make a three-dimensional bipedal robot walk forward as fast as possible, without falling over.

HumanoidStandup

Make a three-dimensional bipedal robot standup as fast as possible.

State of the Art

There are many papers that have experimented with the MuJoCo continuous control environment, but most papers decided not include exact scores and instead used performance curves. Thus, all results were taken from Deep Reinforcement Learning that Matters, a paper on reproducing state-of-the-art policy gradient methods.

If you know other papers that report results on the MuJoCo environment, please email me!

HalfCheetah-v1

Bootstrap Mean

95% Confidence Bounds

Algorithm

5037.26

(3664.11, 6574.01)

DDPG

3888.85

(2288.13, 5131.96)

ACKTR

3043.1

(1920.4, 4165.86)

PPO

1254.55

(999.52, 1464.86)

TRPO

Hopper-v1

Bootstrap Mean

95% Confidence Bounds

Algorithm

2965.33

(2854.66, 3076.00)

TRPO

2715.72

(2589.06, 2847.93)

PPO

2546.89

(1875.79, 3217.98)

ACKTR

1632.13

(607.98, 2370.21)

DDPG

Walker2d-v1

Bootstrap Mean

95% Confidence Bounds

Algorithm

3072.97

(2957.94, 3183.10)

TRPO

2926.92

(2514.83, 3361.43)

PPO

2285.49

(1246.00, 3235.96)

ACKTR

1582.04

(901.66, 2174.66)

DDPG

Swimmer-v1

Bootstrap Mean

95% Confidence Bounds

Algorithm

214.69

(141.52, 287.92)

TRPO

107.88

(101.13, 118.56)

PPO

50.22

(42.47, 55.37)

ACKTR

31.92

(21.68, 46.23)

DDPG

Installation

Prerequisites

To install the MuJoCo environment, you need the OpenAI Gym toolkit. Read this page to learn how to install OpenAI Gym.

You also need to purchase MuJoCo license. MuJoCo offers a 30-day trial license for everyone, and a free license for students using MuJoCo for personal projects only. Visit their license page for more information.