Microsoft's "Divide And Conquer" AI Just Obliterated Ms. Pac-Man

16 JUNE 2017

The notoriously tricky video game Ms. Pac-Man has proved no match for artificial intelligence software, with Microsoft's latest bots able to achieve the maximum high-score of 999,990 - something neither human nor machine has managed before.


Researchers developed a new learning technique to beat the game - using multiple AI bots instead of just one to tackle the different challenges that Ms. Pac-Man throws up.

According to the team from Microsoft-owned startup Maluuba, this approach is particularly suited to Ms. Pac-Man. Not only do gamers have to find their way around a maze, they also need to find bonus items and avoid (or eat) ghosts.

Each aspect of Ms. Pac-Man - avoiding a ghost, eating pellets that make ghosts edible, picking up point-boosting pieces of fruit - was assigned a certain weight as to its importance within the game, and then 163 bots known as "agents" used trial and error to work out the best approach for each element.

One agent might be tasked with finding a fruit, for instance, while another might have the job of avoiding a ghost.

A master agent then used all of the feedback from its subagents to plot the best course through the game. The researchers found that the subagents worked best when they focussed on their own goals, leaving the "senior manager" to see the big picture.


"There's this nice interplay between how [the agents] have to, on the one hand, cooperate based on the preferences of all the agents, but at the same time each agent cares only about one particular problem," says one of the team, Harm Van Seijen.

With so many agents out in the field, as it were, the AI could weigh up the best approach when choosing between avoiding a ghost or heading towards a fruit, or any other decision. It eventually worked out how to pick up maximum points.

The so-called Hybrid Reward Architecture (HRA) system is based on the 'divide and conquer' approach and applying it to AI, using separate algorithms to assess separate tasks, then crunching all that data together to make a final decision.

It's also based on the AI practice of reinforcement learning, where software finds out for itself which decisions are good (getting most Ms. Pac-Man points) and which decisions are bad (getting Ms. Pac-Man eaten by ghosts).

There is one caveat though: the programmers pre-coded the rules of Ms. Pac-Man into the system first, so the agents didn't learn for themselves that ghosts were bad, they knew already.


They then used that knowledge, and the HRA system, to work out how to score maximum points.

In other words, the team of researchers has designed an AI approach specifically for Ms. Pac-Man. Eventually, they hope the same techniques could be used on other games and in advancing artificial intelligence in general.

The paper produced on the research has yet to be peer-reviewed, so we'll have to wait and see what other AI experts make of the system the Maluuba team has put together.

According to the researchers, this combination of reinforcement learning and having separate bots work on separate goals in parallel could help in a variety of situations, from financial models to robotics.

"This idea of having [agents] work on different pieces to achieve a common goal is very interesting," Doina Precup from McGill University in Canada, who wasn't involved in the research, told Allison Linn at Microsoft.

She says it could eventually teach AI to do complex tasks with limited information, like the brain does: "That would be really, really exciting because it's another step toward more general intelligence."

Meanwhile, human Ms. Pac-Man players remain stuck on their high score of 266,330.

The research is available to read at the pre-print website