Training AI frameworks on synthetic data is all the rage these days. Large tech companies and startups alike are digging around for nifty training methods that can reduce the barriers to getting in the self-driving car game and make annotations less of a pain. Companies like Udacity have been steadily releasing real-world driving data, but community needs to continue to grow in volume and specificity.

OpenAI’s Universe came about late last year to address some of this market need, launching with Atari 2600 games, 1,000 flash games and 80 browser environments to help democratize access to training resources. The addition of GTA V is a significant improvement over racing Flash games and opens the door to computer vision and autonomous car researchers.

“Simulation is essential if you really want to do a self driving car,” said Zhaoyin Jia, tech lead for Google self driving at today’s AI Frontiers Conference.

While I can’t say I would want any self-driving car being trained based on my GTA V driving, the game is realistic enough for slightly more disciplined individuals to get the job done.

One of the biggest advantages of training off of a virtual environment is that its primed for harvesting labeled data. Objects in GTA V, whether traffic signs or cyclists, can easily be bounded and analyzed.

One Princeton student, Artur Filipowicz, specifically made use of this to collect 1.4 million images of stop signs in a variety of conditions to predict the chance a given intersection will have a sign present and to estimate the distance to the given sign.

Pulled from research entitled “Driving School II, Video Games for Autonomous Driving” completed by Artur Filipowicz of Princeton University

Starting today, those with a penchant for DIY training can obtain the source code and AMI for GTA V and nab a pre-trained starter agent with TensorFlow and Caffe versions. The kit also includes reward functions for collision avoidance, minimizing destination distance, and maximizing adherence to the road. By putting things in terms of utility, computers can better emulate human thinking with reinforcement learning.

The team is running the game off of a Windows virtual machine hosted in the cloud that communicates with Universe via web sockets and VNC. That provides accessibility to engineers on both Mac and Linux. Oh, and don’t forget, before you get started you’ll need to buy yourself a copy of the game too.