Porting a Poker Hand Evaluator from C to Factor

To help teach myself Factor, I started solving a lot of the problems at Project Euler and approached each one, not only as a way to learn a new language, but as a means of exploring unfamiliar programming/mathematical methods, structures, and algorithms. This article explains the process I went through to port a fairly complicated poker hand evaluation algorithm from C to Factor.

Problem 54

In order to solve Problem 54, I needed to come up with a way to programatically determine the winner when comparing two poker hands. I had recently read an extensive blog post covering various poker hand evaluators, so I knew that I didn’t want to bother with a naive solution when there were much better options out there. After reading through the different techniques, one stood out to me in particular; it was a relatively small amount of code, it didn’t require a ton of storage space, and it was plenty fast…

Cactus Kev’s Poker Hand Evaluator

Through the power of mathematics (combinatorics in particular), you can ascertain that out of all 2,598,960 possible poker hands (52 choose 5), there are actually only 7462 distinct hand values that you need to be concerned with. The basic idea behind Cactus Kev’s Poker Hand Evaluator is that you can take advantage of this fact by storing a card’s representation in an efficient manner, do some basic bit twiddling, some multiplication (which is computationally cheap), add in a couple lookup tables, and you can determine a hand’s equivalence class value very quickly.

The key concept here is that each of a card’s possible 13 ranks (deuce through ace) is stored as a different prime number, and by multiplying a hand’s five prime numbers together, you’ll get a unique result that can then be used to determine that hand’s overall value. If you handle the special cases of flushes, straights, and high cards, the rest can be boiled down to a simple binary search. You can read the gory details at Kevin Suffecool’s site.

The Porting Process

Bitfield Representation

Each card will be stored as a bitfield, inside of an interger that’s 4-bytes long:

Prime Numbers In Factor:

A minor change I made here was to establish a RANK_STR constant to make it easier to develop ways to parse and print card representations later (see the section on printing below for the C equivalent). That said, card-rank-prime in the Factor version has the same purpose as the primes array in the C version, but it performs the lookup for you.

Initializing Cards

Now that we have some basic infrastructure in place, we can tackle initializing cards in the desired format. Cactus Kev’s code glosses over this process and only demonstrates how to generate an entire deck, so I’ll just cover the Factor code here…

To use this code, you’d start with a string representing a card, like "AS" for the Ace of Spades or "7H" for the Seven of Hearts, and then pass it to >ckf to get its integer representation. The parse-cards word simply does this conversion for an entire hand; given a string like "7C 5D 4H 3S 2C", it will output an array of the proper integers:

Due to Factor’s concatenative nature, you tend to write a lot of small words that do one specific thing, and then tie them together. Words are expected to be defined before you reference them (unless they’ve been deferred), so when reading code from top to bottom, you’ll see big-picture words defined after the more-focused words they use. It’s also a convention to name helper words the same as the word from which they’re called, but with parentheses around the name.

One special thing to note is the use of locals in the (>ckf) helper word…if you start a word definition with :: instead of just :, Factor will bind the named inputs (from the stack declaration) to lexical variables from left to right. It’s used in this case because we need to pass these values to the card-bitfield word in a specific order, and shufflers would make this a bit more confusing.

The last thing to explain is the card-bitfield word itself…it looks up the proper bit values for the different parts of our card representation, and then constructs the final 4-byte integer from those values. See the bitfield documentation for more details on its syntax.

Shuffling Decks and Printing Hands

Initializing and shuffling a deck wasn’t strictly necessary for solving the Project Euler problem, but it was a useful abstraction and would be expected in a general poker library. Printing out a poker hand falls under this same category, but I’ve included it here as well since it was part of Cactus Kev’s original code. This is where the languages start to noticibly deviate in their implementations…

I’m cheating a bit here with the shuffle word, as Factor already implements randomize using the Fisher-Yates algorithm; so rather than re-inventing the wheel, I just aliased the wheel. With Factor, you’ll also notice that count-controlled loops are much less common in comparison to using sequence operators directly on the elements of a data structure.

Lookup Tables

We can almost get to the actual evaluation of hands to determine their relative value, but before that, we’ll need the proper lookup tables. These are all straight-forward arrays, and we have four to consider:

a table for all flushes (including straight flushes)

a table for all non-flush hands consisting of five unique ranks (i.e. either straights or high card hands)

a table of the product of prime values associated with a hand

a table covering the final hand values of all hands not covered by the flushes/unique5 tables

A zero means that specific combination is not possible for that type of hand…

Hand Evaluation

Finally, we can get to the actual evaluation of hands…

Hand Evaluation In C:

intfindit(intkey){intlow=0,high=4887,mid;while(low<=high){mid=(high+low)>>1;// divide by twoif(key<products[mid])high=mid-1;elseif(key>products[mid])low=mid+1;elsereturn(mid);}fprintf(stderr,"ERROR: no match found; key = %d\n",key);return(-1);}shorteval_5cards(intc1,intc2,intc3,intc4,intc5){intq;shorts;q=(c1|c2|c3|c4|c5)>>16;/* check for Flushes and StraightFlushes */if(c1&c2&c3&c4&c5&0xF000)return(flushes[q]);/* check for Straights and HighCard hands */s=unique5[q];if(s)return(s);/* let's do it the hard way */q=(c1&0xFF)*(c2&0xFF)*(c3&0xFF)*(c4&0xFF)*(c5&0xFF);q=findit(q);return(values[q]);}

Each language’s implementation checks for flushes first, then for hands with five unique ranks, and if all else fails, multiplies the primes together, searches for that product in a lookup table, and then uses that index to determine the hand’s final value. The findit function in C is the binary search, and once again I used Factor’s built-in libraries to do the same thing with the sorted-index word.

Hand Ranking

We’re basically done at this point…the hand with the lowest value (from 1 to 7462) wins. If the values are the same, the hands are tied. A simple comparison function can tell you which hand is best.

One last function that’s handy though, is the ability to output what that value actually means…

The source-054 word simply converts the provided text file into the format our evaluator expects and the hand comparisons are subsequently handled by the euler054 word and counted up! Let’s just say that Player 2 is much better at poker :-)

Vocabulary Improvements

Since porting the initial version of the poker vocabulary, many improvements have been made to generalize the code for use beyond just solving Project Euler problems. I ended up dropping the binary search in favor of using the Senzee Perfect Hash Optimization, and other Factor contributors have added code specifically for Texas Hold’em and Omaha evaluation. I didn’t want to cover all of that here, as the focus of this article is the direct comparison between C and Factor, but you can view the current poker vocabulary to see all of these enhancements.

As always, feel free to drop me a line if you’d like to discuss things further.