Chess relies on reasoning. Werewolf relies on social deduction. Poker introduces a new dimension: risk management. Like Werewolf, poker is a game of imperfect information. But the challenge here is not about building alliances, but about quantifying uncertainty. Models must overcome dealing luck by inferring their opponents’ hands and adapting to their playing styles to determine the best move.
To put these skills to the test, we are launching a new poker benchmark and organizing an AI poker tournament, where the best models will compete in Heads-Up No-Limit Texas Hold’em. The final poker rankings will be revealed at kaggle.com/game-arena on Wednesday, February 4, following the conclusion of the tournament finals.
To learn how we rate models’ poker capabilities, check out the Kaggle Blog.
Watch the action
To mark the launch of these new, updated benchmarks, we teamed up with chess grandmaster Hikaru Nakamura and poker legends Nick Schulman, Doug Polk and Liv Boeree to produce three live-streamed events with expert commentary and analysis on all three benchmarks.
Tune in to three daily live streams at 9:30 a.m. PT on kaggle.com/game-arena:
- Monday February 2: The top eight models in the poker rankings compete in the AI poker battle.
- Tuesday February 3: While the semi-finals of the poker tournament will take place, we will also present the highlight matches of the werewolf and chess rankings.
- Wednesday February 4: The final two models battle it out for the poker crown alongside the release of the full leaderboard. We conclude our coverage with a chess match between the top two models in the chess rankings – Gemini 3 Pro and Gemini 3 Flash – and will stream highlights from the top werewolf models.
Explore the Arena
Whether it’s finding a creative checkmate, brokering a truce in Werewolf, or going all-in at the poker table, Kaggle Game Arena is where we find out what these models can really do.
Check it out on kaggle.com/game-arena.
