Owing to advances in artificial intelligence, computers have been able to beat humans at increasingly complex games. Nevertheless, proposed algorithms in this space are often developed with a single game in mind, and more importantly, existing algorithms are able to play games of either perfect information — in which each player knows or can see other player’s moves (such as chess and Go) — or imperfect information — in which some aspects of the game are unknown or hidden to some or all players (such as poker), but never both, given that they mostly require different strategies. In a recent study, Michael Bowling and colleagues introduce an algorithm, called Student of Games (SoG), that can perform well with both perfect and imperfect information.
The search algorithm in SoG is based on counterfactual regret minimization (CFR), which is an iterative approach that converges to a Nash equilibrium in two-player zero-sum games and that is often used for solving games of imperfect information. In a nutshell, SoG trains agents via self-play, in which each player uses a search based on CFR to generate a policy — a set of probability distributions over game actions — for the current game state, which is then used to sample an action to take. The authors evaluated SoG on four games: two games of perfect information (chess and Go) and two games of imperfect information (poker and Scotland Yard). In chess and Go, while SoG does not perform as well as a previously proposed algorithm (AlphaZero), it still has a strong performance while being able to scale with increasing availability of computation resources. Notably, SoG beats state-of-the-art agents in poker and Scotland Yard. The ability of the proposed algorithm to perform well in fundamentally different game types will likely prove to be useful in the continuous development of artificial intelligence and game theory.
This is a preview of subscription content, access via your institution