Saturday, November 26, 2005

Bayesian Poker

Here is an interesting name for what we know and do intuitively: Bayesian Poker.

From the page:

Let's start by describing how you would play Hold'Em if you had all the skills of a top player and also an unlimitted amount of computer processing power.

First, you need to put your opponents on various cards. You don't actually put them on certain cards, rather you assign them a probability of having each possible hole. There are (52*51)/2 holes before you see any cards, and the number goes down as more cards are revealed. Basically you are doing a Bayesian estimation here - given their actions, which you have seen, what is the probability they would have acted that way with a given hole? Since all holes are a-priori equally likely, the probability of a hole is just the probability that they would have done those actions with that hole. In conditional probability this can be written :

P(H|A) = P(A|H)

Here "H" is some hole (two cards) and "A" is the set of their actions you have seen (eg. check, bet, call, etc.). The "|" means "given", so this reads "The Probability of hole H given actions A is equal to the probability of actions A given hole H".

Let's consider this in practice. At the start of action pre-flop everyone has equal probability of every hole. Now, you can see your own hole, so all holes containing your two cards are impossible, so they get a probability of zero. Now imagine the player under the gun limps in. What is the probability he would do that with various cards? Well, with something like the 24o (two-four-offsuit), it's very unlikely indeed. So, you adjust your probability, like : P(24o) = 0.000001 , or something. With something like AA (aces) it's pretty likely he would limp; maybe he would limp 75% of the time with Aces and raise 25% of the time. So P(AA) = 0.75

When the flop comes out on the board, their probabilites don't change, but they do change as they make more actions. Let's say after the flop, they check. So, they're probably weak *or* trapping. So, you can lower the probability of medium level hands, like top pair with a bad kicker or middle pair, and the probability of checking with a very weak hand is roughly 100%, and the probability of checking a good hand is maybe 25% or something; so you multiply that probability for each hole onto the probability you already had.

For example, if the big blind just checks pre-flop, that doesn't mean that all holes are equally likely. The reason is that he would have raised with some of his better holes, and you've seen that he hasn't raised, so that means he's more likely to have junk, or possibly a monster that he's slow-playing. Simply good holes like AJ or 88 are then less likely than pure junk like 72o or monsters like AA.

Now, assuming you could estimate all these probabilities right, you would have a perfect model of your opponent. That model is simply a probability for each hole. In order to estimate the probabilities, you need a good "opponent model" - that's imagining what your opponent would do in all situations. Of course, what your opponent will do is based on what he thinks you will do, so you have to imagine "what does he think that I think?".

The full Bayesian model allows you to predict future actions :

P(F|S) = Sum[H] P(F|H) * P(H|S)
P(F|S) = Sum[H] P(F|H) * P(S|H)

This reads : The Probability of a Future action (F), given some Seen actions (S) is equal to the sum on all possible hole cards (H), of the Probability of the future action given the hole H times the probability of the seen actions given the hole. This all just uses one function, the opponent model function P(A|H), the probability of some action A given a specific set of hole cards H.

Given this model, playing poker is pretty easy. As each opponent acts, you adjust their probabilities. On your turn, you always have three choices - { fold, call/check , bet/raise }. I'll refer to these as just FCR - {fold,call,raise}. To choose one, you want the one that gives you the highest EV. To compute EV, you try each one, and then simulate what your opponents will do for all their hole cards, weighted by the probability of all their future actions. That is, you form the full EV equation :

add/subtract the cost of my action (calling/raising/etc)
for the next opponent
sum over all possible hole cards by him; weighted by the probability of each hole
he can choose {f,c,r}
for that hole, compute his probabilities of each P(f),P(c),P(r)
branch 3 ways and sum with the weights of his probabilities of each
adjust the pot for his actions in each case
continue

In most cases this will come back to me and I'll have a choice again. Each point in the tree where I have a choice, you take the best (highest EV) of the 3 choices. At each point where an opponent has a choice (FCR), you sum all three weighted by the probability of each action given that hole. For all opponents you always do a weighted sum on all the holes they could have.

On each branch of this sum, he has specific hole cards, so when you get to a showdown, you just see who has the best hand, and they win the pot.

One nice way to do this weighted sum is to sum over all possible hole card assignments to the entire table; this brings the sum out of the whole thing and you just wind up with this big trinary-tree of branching choices for you and your opponent.

Note that this EV calculation contains every fancy or smart move you can imagine - the idea of semi-bluffing is in there, the idea of betting when you think he will call and you have a better hand, the idea of checking it off if you think he'll bluff and you can beat him, etc. etc. All you do is pick the move with the biggest EV and you're golden.

The only exception to that is when you may want to maximize your long term EV instead of your EV for just this hand. For example, when you first sit down at the table, you may want to make some sub-optimal moves to establish a certain image. Similarly, you may want to fold slightly less than optimal on the river to prevent people from bluffing you out too often, etc.

Now, the big difficult piece here was the opponent model. If you have a good opponent model, then the EV tree is just a matter of having a huge amount of processing. It is, of course, too much processing in most cases. There are simple cases, like heads-up play, where it can actually be evaluated (or approximated well), but in practice, you wind up doing some heuristics and estimation.

You can see there's almost always some very small chance of any opponent hole being there.

So, this is the "ideal" way to play poker. How do you get close in practice? Well, you usually keep a sort of simplified opponent model going in your head. For example you might have the idea that "this guy probably has Ax, or maybe KQ, and he also has like a 10% chance of any two cards". That's a form of Bayesian poker, and all good poker players are tracking their opponents like that. Then, using that opponent model you can start doing some prediction stuff. Like say if you bet, he'll call with Ax and fold the rest of his hands, and you have A8, so when he calls you beat him about half the time. That makes your bet +EV, though you still need to compare to calling. For most practical purposes you usually don't need to look very far ahead to maximize EV. Top do heuristic forms of this analysis all the time, which we'll talk about more in the next section.

One thing to note is that the holes we're summing on here are all the holes of all suits, that is all 52*51/2 = 1326 cards (you must divide by two since the order doesn't matter). Usually humans think in terms of holes by their 169-card groupings, that is, AKo or QJs or TT, instead of thinking of AhKc, AdKs, etc. The exact suits only really matter when some flush draw is possible. When you do this reduction from the full 1326-card hole list to the 169-card hole list, you have to remember that there are more ways to make some of the holes. There are 6 ways to make a pocket pair, there are 12 ways to make two offsuit non-paired cards, and 4 ways to make suited cards; that is, there are 6 TT's, 12 AKo's, and 4 AKs's, so a total of 16 AK's if you count both suited and non-suited. This means that the a-priori chance of someone having a pocket pair is muck lower, because there are many fewer hole cards that make them.



Other links of interest: