Computer scientists from the University of Alberta in Canada have programmed an AI poker player that can never lose across a series of hands of heads-up limit Texas hold'em.
Named Cepheus, the program uses a strategy for a two-player game of heads-up limit Texas holdâem poker thatâs so brilliant, statistical analysis says that even if a person spends their entire lifetime playing poker against this program, the program will never lose. The AI will only ever come out on top, or break even. It will never make a mistake, even not knowing what cards its opponent is holding.
And itâs not about winning every hand - it will get dealt dud cards as often as the next guy - but the AI has figured out how to turn even the worst situations around to come out on top. â[It] will lose if it's dealt an inferior hand, but it will minimise its losses as best as is mathematically possible and will slowly but surely take your money by making the "perfect" decision in any given scenario,â says Jason Koebler at Motherboard. "Heads-up limit hold'em, it can be said, has been âsolvedâ."
The difference between âlimitâ and âno limitâ poker comes down to money - youâre either limited to set amounts you can bet, or you can bet as much as you like. The latter would be impossible for a computer program to solve, because predicting random and limitless amounts of money is a feat in and of itself. But that doesnât make what this program can do any less impressive.
Compare what itâs doing to other games that have been âsolvedâ by AI players. Even our feeble human brains can figure out how to be unbeatable in naughts and crosses. Games like chess and checkers take the complexity way up a notch because of all the different possibilities at every turn, but all the information is still there on the board. The opponent canât hide anything, other than their strategy, but that doesnât matter, because the AI already knows every possible play, and has already figured out the perfect strategy for countering each move before the game has even started.
But what about poker? The program would be cheating if it knew what was on the two hidden cards its opponent holds in every round. As Koebler notes, Cepheus must somehow know how to navigate the 3 x 10^14 possible decisions in a game of limit poker, in which, at any given moment, not all the information is known to it. The University of Alberta team refers to this kind of game, where not all information is known, as an âimperfect-informationâ game.
âThe solutions for imperfect-information games require computers to handle the additional complication of not knowing exactly what the gameâs state is, such as not knowing an opponentâs hand,â one of the team, Neil Burch, told Jeremy Hsu at IEEE Spectrum. âSuch techniques require more computer memory and computing power.â
How much memory, exactly? Roughly 262 terabytes. Yikes. And what does it do with all that memory? Cepheus runs an algorithm called CFR+, which was invented by the team as an enhancement of an existing algorithm known as the counterfactual regret minimisation (CFR).
Regret minimisation is, essentially, about learning from your mistakes. So, as Arielle Duhaime-Ross explains at The Verge, if Cepheus thinks about the possibility of raising a bet, and decides to play randomly instead and loses, it will retrace its steps, figure out how much it could have won if it had raised, and will store that amount as a âregret valueâ.
This value is placed on every possible opportunity that this same decision is available to the computer, so it will avoid making the same mistake. This is very different from the way humans play - if we take a big loss, we focus on trying to win that back, rather than on how to use that information to perfect the rest of our game. Cepheus will continue to update itself with these regret values until it reaches what the team calls âperfect playâ.
"CFR+ still works like the old CFR algorithms by gradually developing better solutions through playing thousands and hundreds of thousands of hands of poker,â Hsu writes at IEEE Spectrum. "But it can develop a very good solution much faster than any past CFR algorithm by being more efficient; basically the equivalent of taking fewer, bigger steps toward the best solution.â
According to IEEE Spectrum, once the team had figured out the strategy, they managed to whittle the memory requirements down to less than 11 terabytes to store the counterfactual values, and 6 terabytes to compute the main strategy. It ended up taking 24 trillion hands of poker over 70 days, and 200 computers running the CFR+ algorithm with 32 GB of RAM and 24 central processing units to train Cepheus to âsolveâ the game. "We could continue to train it, and it would continue to get better,â one of the team, Michael Bowlin, told Arielle Duhaime-Ross at The Verge. "But we stopped at this point because we canât tell it apart from being perfect.â
The team is now going to work on adjusting the algorithm to play heads-up no limit poker. They know, because of the variables, it will probably be impossible to create an âunbeatableâ AI player, but they hope to create a player that can beat the best human players in the world. Thatâs the goal. And theyâre also thinking about how to use the technology to help governments and private companies create better security systems that canât be hacked.
Want to try your hand against Cepheus? Click here to play against it. I just hope you enjoy the feeling of losing.