This computer program can teach itself to beat Super Mario Bros.

2013-05-13 11:00 a.m.
| Last updated 2015-12-09 08:03 a.m.

Computer scientist Dr. Tom Murphy has reportedly developed a program that can beat you at your favorite childhood game.

From our friends at

It's fairly well known that the first part of the first level of Super Mario Bros. was designed to teach new players how to jump on bad guys, eat mushrooms, and so forth. But what about players that aren't human? Could a computer be programmed to learn how to play classic games? Work done by computer scientist Dr. Tom Murphy suggests that yes, AI can learn to game.

The video above features Murphy explaining the results of a paper he recently published and presented at the 2013 SigBovik conference. (SigBovik is held annually on April 1, and prominently features spoof research, although Murphy clearly states that his research isn't fake. For posterity's sake, I've yet to find any sources claiming this is a prank.) In short, his goal was to develop a program that could learn from a user what it takes to beat a game, and then apply that to its own methods.

"The basic idea is to deduce an objective function from a short recording of a player's inputs to the game," he writes. "The objective function is then used to guide search over possible inputs, using an emulator. This allows the player's notion of progress to be generalized in order to produce novel gameplay."

As you can see by the early gameplay testing (starting around the 6:00 mark in the above video), the first iteration of Murphy's program (a pair of them, actually) basically mashed buttons and got nowhere. But as its goals and methods were refined (scoring a lot of points was one static goal, aside from simply "winning") by Murphy, it slowly gets better.

As he notes in the paper's title, Murphy's system is based on lexicographic ordering, which is essentially the mathematical technique for figuring out how a set of data should be sorted. The first of his programs he used, called LearnFun, recorded all of the output data from his gameplay–everything from the number of coins he had to how far right he scrolled–to learn what values could be manipulated and which ones changed as a result of that.

He then fed that data into his second program, PlayFun, which essentially tried to figure out which combination of input values would result in the most desirable outputs–mainly, these goals were scoring lots of points and scrolling as far right as possible.