Hashfunction to map combinations of 5 to 7 cards

Question 1

Paul Senzee has a great post on this for 7 cards (deleted link as it is broken and now points to a NSFW site).

His code is basically a bunch of pre-computed tables and then one function to look up the array index for a given 7-card hand (represented as a 64-bit number with the lowest 52 bits signifying cards):

inline unsigned index52c7(unsigned __int64 x)
{
    const unsigned short *a = (const unsigned short *)&x;
    unsigned A    = a[3],                B    = a[2],                        C    = a[1],            D   = a[0],
             bcA  = _bitcount[A],        bcB  = _bitcount[B],                bcC  = _bitcount[C],    bcD = _bitcount[D],
             mulA = _choose48x[7 - bcA], mulB = _choose32x[7 - (bcA + bcB)], mulC = _choose16x[bcD];
    return _offsets52c[bcA]                      + _table4[A] * mulA + 
           _offsets48c[ (bcA << 4)        + bcB] + _table [B] * mulB +
           _offsets32c[((bcA + bcB) << 4) + bcC] + _table [C] * mulC + 
                                                   _table [D];
}

In short, it's a bunch of lookups and bitwise operations powered by pre-computed lookup tables based on perfect hashing.

If you go back and look at this website, you can get the perfect hash code that Senzee used to create the 7-card hash and repeat the process for 5- and 6-card tables (essentially creating a new index52c7.h for each). You might be able to smash all 3 into one table, but I haven't tried that.

All told that should be ~628 MB (4 bytes * 157 M entries). Or, if you want to split it up, you can map it to 16-bit numbers (since I believe most poker hand evaluators only need 7,462 unique hand scores) and then have a separate map from those 7,462 hand scores to whatever hand categories you want. That would be 314 MB.

Question 2

Here's a different answer based on the colex function concept. It works with bitsets that are sorted in descending order. Here's a Python implementation (both recursive so you can see the logic and iterative). The main concept is that, given a bitset, you can always calculate how many bitsets there are with the same number of set bits but less than (in either the lexicographical or mathematical sense) your given bitset. I got the idea from this paper on hand isomorphisms.

from math import factorial


def n_choose_k(n, k):
    return 0 if n < k else factorial(n) // (factorial(k) * factorial(n - k))


def indexset_recursive(bitset, lowest_bit=0):
    """Return number of bitsets with same number of set bits but less than
    given bitset.

    Args:
      bitset (sequence) - Sequence of set bits in descending order.
      lowest_bit (int) - Name of the lowest bit. Default = 0.

    >>> indexset_recursive([51, 50, 49, 48, 47, 46, 45])
    133784559
    >>> indexset_recursive([52, 51, 50, 49, 48, 47, 46], lowest_bit=1)
    133784559
    >>> indexset_recursive([6, 5, 4, 3, 2, 1, 0])
    0
    >>> indexset_recursive([7, 6, 5, 4, 3, 2, 1], lowest_bit=1)
    0

    """
    m = len(bitset)
    first = bitset[0] - lowest_bit
    if m == 1:
        return first
    else:
        t = n_choose_k(first, m)
        return t + indexset_recursive(bitset[1:], lowest_bit)


def indexset(bitset, lowest_bit=0):
    """Return number of bitsets with same number of set bits but less than
    given bitset.

    Args:
      bitset (sequence) - Sequence of set bits in descending order.
      lowest_bit (int) - Name of the lowest bit. Default = 0.

   >>> indexset([51, 50, 49, 48, 47, 46, 45])
    133784559
    >>> indexset([52, 51, 50, 49, 48, 47, 46], lowest_bit=1)
    133784559
    >>> indexset([6, 5, 4, 3, 2, 1, 0])
    0
    >>> indexset([7, 6, 5, 4, 3, 2, 1], lowest_bit=1)
    0

    """
    m = len(bitset)
    g = enumerate(bitset)
    return sum(n_choose_k(bit - lowest_bit, m - i) for i, bit in g)