Without being given any rules or prior information, a simple computer has learnt how to play 49 classic Atari games in just two weeks - and it's learnt to play them pretty damn well. But what's most impressive is that the Google-built algorithm it uses wasn't even built specifically to play games, just to learn from its own experience.
What does that mean, other than the fact computers can now beat us at Space Invaders and Breakout, as well as Chess, Texas hold'em poker and solving Rubik's Cubes? It turns out we now have the early stages of a general learning algorithm that could help robots and computers to become experts at any task we throw at them, and that's a pretty huge deal.
"This is the first time that anyone has built a single general learning system that can learn directly from experience to master a wide range of challenging tasks," Demis Hassabis, one of the lead researchers, told William Herkewitz from Popular Mechanics. Hassabis was one of the co-founders of DeepMind Technologies, the company that started making the algorithm and was bought out by Google last year for a reported US$400 million.
Publishing today in Nature, the team explains how the deep learning algorithm, which is called Deep Q-Network, or DQN, was able to master games such as Boxing, Space Invaders and Stargunner without any background information. This includes details such as what "bad guys" to look out for, and how to use the controls. It only had access to the score and the pixels on the screen in order to work out how to become an expert player.
By playing the games over and over and over again, and learning from its mistakes, the algorithm learn first how to play the game properly, and then, within a fortnight, how to win.
Of course, this isn't the first program that teaches a computer to become an expert gamer. Just over 20 years ago, a program known as TD-Gammon mastered Backgammon. But the difference is TD-Gammon never managed to do that well with similar games, such as Chess and Checkers, as Toby Walsh, a computer scientist from National ICT Australia and UNSW who wasn't involved in the research, explains over at The Conversation.
The DQN algorithm, on the other hand, could master a range of different games, thanks to two technological advances.
First of all, DQN relies on a positive-reinforcement learning method called Q-learning. This basically means that the algorithm will do everything it can - press every button and move the joystick around like a crazy person - in order to get closer to "Q", which is a value that computer scientists have set as the ultimate reward. In the case of this experiment, that reward was game score, and the higher the better.
"To understand how to maximise your score in a game like Space Invaders, you have to recognise a thousand different facts: how the pixilated aliens move, the fact that shooting them gets you points, when to shoot, what shooting does, the fact that you control the tank, and many more assumptions, most of which a human player understands intuitively. And then, if the algorithm changes to a racing game, a side-scroller, or Pac-Man, it must learn an entirely new set of facts."
But this is where the second improvement comes in - DQN is built upon a network that was inspired by the human brain's ability to separate background noise from important information. Which means DQN is able to gulp up valuable clumps of information based on its prior experience, and learn from them.
While this is an awesome breakthrough, it's important to note that this isn't a true general learning algorithm just yet. Programmers still had to set a Q value for the program in order for it to learn - a truly intelligent system would be able to work out its own objectives in order to master a new skill.
And DQN never truly understands the games it's playing, like a human would, it just learns what to do in order to get a better score. Because of this, there were some games that DQN couldn't master, such as Montezuma's Revenge (you can read more about these over at The Washington Post).
In the future, the team hope to expend the algorithm so that it can help to sift through large amounts of scientific data, and come to its own conclusions. "This system that we've developed is just a demonstration of the power of the general algorithms," one of the developers, Koray Kavukcuoglu, told Herkewitz. "The idea is for future versions of the system to be able to generalise to any sequential decision-making problem."
Read this next: This computer program has ‘solved’ Texas hold’em poker