The Genealogical World of Phylogenetic Networks: Tournament success is not poker success

Let us suppose for a moment that we wish to list the world's best professional poker players. This might be of some interest, because poker is partly a game of luck (the cards are dealt at random) and partly a game of skill (players choose how to play their cards). Indeed, put simply, the idea is to convince your opponents that you have a weak hand when they have a strong one (so that they will bet against you) and a strong hand when they have a weak one (so that they will fold).

One well-known way to assess poker success is to look at tournament winnings. Indeed, Nathan Williams recently did this for The Top 50 Best Poker Players of All Time by simply listing the 50 greatest money earners from The Hendon Mob database. This database accumulates data on the lifetime money winnings for all of those participants who have ever cashed in a live poker tournament.

However, this approach does not work. In fact, there are at least five reasons why this is not appropriate:

Inflation continues unabated. After all, $1 now is not worth as much as $1 was 30 years ago. In fact, something that cost $1 in 1990 would cost a bit more than $2 now (ie. the money has been devalued to 50%). So, the value of current winnings cannot be compared to those of the past.
There are more tournaments now than there have ever been. So, there are more opportunities to play them now, and to thereby potentially accumulate more money for the same tournament success rate.
The tournament fields are now generally bigger. This means that the average prize money for each tournament is now much greater than before (since the money is provided by the participants themselves). In particular, the top prizes now provide more money than whole tournaments did 20 years ago.
Some of the best players play online rather than live. Obviously, this is a bit more difficult these days, due to the banning of online poker in the USA, but it is still a significant source of poker income for many people.
Some of the best players do not play many tournaments —instead, they play cash games. Indeed, if you want to make a living playing poker, you may be better off playing for cash rather than for prize money, as tournament success is much more of a lottery.

The first three reasons all mean that we would have to adjust the tournament winnings, if we wish to have a meaningful assessment of lifetime earnings. As one example of the need to do this, we can look at point no. 3 in a simple way. The first graph shows the current top-100 money earners from The Hendon Mob. For each player, it shows how much of their total earnings came from their biggest single tournament cash.

Note that for the majority of players, a large part of their lifetime winnings came from a single tournament — the median percentage is 18.4% (range 3.8–97.7%). Indeed, for some of the players it is >50%, and for a few it is almost all of their money. Bigger fields mean more money per tournament, and thus bigger cashes when you do well. Note, incidentally, that this graph does contain the top 17 biggest cashes in history (to date).

An alternative approach

So, in order to evaluate players, we actually need a list of criteria that is independent of money won. That is, we need a list of the poker skills of each player. There are several different skills involved in playing poker, and presumably some people are good at some of them, and other people are good at some of the others. A comparison of relative skills is what we need.

This approach was actually tried by Barry Greenstein back in c. 2005. What he did was try to rate a group of 33 of the poker players that he had played against in cash games. He rated these players by style of play, based on ten playing criteria (each scored on a 1–10 scale):

Aggressiveness
Looseness
Short-handed play
Limit poker
No-limit poker
Tournaments
Side games
Steam control
Against weak players
Against strong players

Given the time at which this analysis was done (2005), the modern crop of young players are obviously not included, and a few of those people included are no longer playing. However, it is worthwhile looking at the data to see just what can be done with this approach.

Greenstein himself notes: "I don’t think you can add up the ratings in the skill categories to get an accurate comparison of players." He is right; but first let's do it anyway. So, the next graph shows the total score (out of 100) for each player. (Click on the figure to see it at full size.)

This problem here is that we are comparing apples with oranges. That is, the rank ordering of the sum does not make much sense, because it does not group players with similar playing strengths. The rank order would make sense when comparing each feature one at a time, but not for the total. For example, ranking by total winnings does make sense, because we have only one criterion: money (although it is not a useful criterion). This is the basic weakness of having a single rank order.

As one example of how the "overall score" misses important points, note that Eric Seidel and John Juanda have the same total. However, Seidel exceeds Juanda on Stem control, while Juanda exceeds Seidel on Looseness — these are actually two rather different players.

A better way to look at the data is to use a network, as we often do in this blog. The final graph is a NeighborNet (based on the manhattan distance) of Greenstein's data. Each point represents one of the 33 people. Those people that are near each other in the network have a similar set of scores, while people further apart are progressively more different from each other as poker players.

As you can see, there is no simple trend from "best" to "worst", but instead a complex set of relationships, just as we would expect. However, the network does show an overall trend of decreasing total score from top to bottom (compare this to the previous graph).

Note, first, that Eric Seidel and John Juanda are on opposite sides of the network (Juanda left, Seidel right). This illustrates how much better the network is as a display of the data, compared to simply summing the scores (as in the previous graph). The network accurately shows the differences in the relative playing styles.

There are some players who are actually gathered together in the network, indicating that they have similar scores across all 10 criteria. For example, Barry Greenstein , Eric Seidel and Howard Lederer rarely differ by more than 1 point on any of the criteria — according to Greenstein, these people have very similar playing styles.

Alternatively, Pil Helmuth and T.J. Cloutier have scores that differ from the other players — both have low scores on Side games and Steam control. Gus Hansen is near these two because all three have high scores for Against weak players. Similarly, the legendary Stu Ungar and Patrik Antonius both have high Aggressiveness and Looseness.

There is one a final point worth mentioning. As Michel Bettane once said (The absurdity and flattery of scores):

It doesn't take a genius to appreciate the absurdity of giving a number score to a work of art or, worse still, an artist. Salvador Dalí had huge fun scoring great artists (including himself) on the basis of design, color, and composition — but that says far more for his sense of provocation and irony than it does for the principle itself.

Is poker an art, a science or a sport? If it is either of the first two, then scoring players may actually be a Bad Idea.

Monday, April 15, 2019

Tournament success is not poker success

1 comment: