Question

This is a part of a program that analyzes the odds of poker, specifically Texas Hold'em. I have a program I'm happy with, but it needs some small optimizations to be perfect.

I use this type (among others, of course):

  type
    T7Cards = array[0..6] of integer;

There are two things about this array that may be important when deciding how to sort it:

  1. Every item is a value from 0 to 51. No other values are possible.
  2. There are no duplicates. Never.

With this information, what is the absolutely fastest way to sort this array? I use Delphi, so pascal code would be the best, but I can read C and pseudo, albeit a bit more slowly :-)

At the moment I use quicksort, but the funny thing is that this is almost no faster than bubblesort! Possible because of the small number of items. The sorting counts for almost 50% of the total running time of the method.

EDIT:

Mason Wheeler asked why it's necessary to optimize. One reason is that the method will be called 2118760 times.

Basic poker information: All players are dealt two cards (the pocket) and then five cards are dealt to the table (the 3 first are called the flop, the next is the turn and the last is the river. Each player picks the five best cards to make up their hand)

If I have two cards in the pocket, P1 and P2, I will use the following loops to generate all possible combinations:

for C1 := 0 to 51-4 do
  if (C1<>P1) and (C1<>P2) then
     for C2 := C1+1 to 51-3 do
       if (C2<>P1) and (C2<>P2) then
         for C3 := C2+1 to 51-2 do
           if (C3<>P1) and (C3<>P2) then
             for C4 := C3+1 to 51-1 do
               if (C4<>P1) and (C4<>P2) then
                 for C5 := C4+1 to 51 do
                   if (C5<>P1) and (C5<>P2) then
                   begin
                     //This code will be executed 2 118 760 times
                     inc(ComboCounter[GetComboFromCards([P1,P2,C1,C2,C3,C4,C5])]);
                   end;

As I write this I notice one thing more: The last five elements of the array will always be sorted, so it's just a question of putting the first two elements in the right position in the array. That should simplify matters a bit.

So, the new question is: What is the fastest possible way to sort an array of 7 integers when the last 5 elements are already sorted. I believe this could be solved with a couple (?) of if's and swaps :-)

Was it helpful?

Solution

I don't know that much about Texas Hold'em: Does it matter what suit P1 and P2 are, or does it only matter if they are of the same suit or not? If only suit(P1)==suit(P2) matters, then you could separate the two cases, you have only 13x12/2 different possibilities for P1/P2, and you can easily precalculate a table for the two cases.

Otherwise, I would suggest something like this:

(* C1 < C2 < P1 *)
for C1:=0 to P1-2 do 
   for C2:=C1+1 to P1-1 do 
      Cards[0] = C1;
      Cards[1] = C2;
      Cards[2] = P1;
      (* generate C3...C7 *)

(* C1 < P1 < C2 *)
for C1:=0 to P1-1 do 
   for C2:=P1+1 to 51 do 
      Cards[0] = C1;
      Cards[1] = P1;
      Cards[2] = C2;
      (* generate C3...C7 *)

(* P1 < C1 < C2 *)
for C1:=P1+1 to 51 do 
   for C2:=C1+1 to 51 do 
      Cards[0] = P1;
      Cards[1] = C1;
      Cards[2] = C2;
      (* generate C3...C7 *)

(this is just a demonstration for one card P1, you would have to expand that for P2, but I think that's straightforward. Although it'll be a lot of typing...) That way, the sorting doesn't take any time at all. The generated permutations are already ordered.

OTHER TIPS

For a very small set, insertion sort can usually beat quicksort because it has very low overhead.

WRT your edit, if you're already mostly in sort order (last 5 elements are already sorted), insertion sort is definitely the way to go. In an almost-sorted set of data, it'll beat quicksort every time, even for large sets. (Especially for large sets! This is insertion sort's best-case scenario and quicksort's worst case.)

Don't know how you are implementing this, but what you could do is have an array of 52 instead of 7, and just insert the card in its slot directly when you get it since there can never be duplicates, that way you never have to sort the array. This might be faster depending on how its used.

There are only 5040 permutations of 7 elements. You can programmaticaly generate a program that finds the one represented by your input in a minimal number of comparisons. It will be a big tree of if-then-else instructions, each comparing a fixed pair of nodes, for example if (a[3]<=a[6]).

The tricky part is deciding which 2 elements to compare in a particular internal node. For this, you have to take into account the results of comparisons in the ancestor nodes from root to the particular node (for example a[0]<=a[1], not a[2]<=a[7], a[2]<=a[5]) and the set of possible permutations that satisfy the comparisons. Compare the pair of elements that splits the set into as equal parts as possible (minimize the size of the larger part).

Once you have the permutation, it is trivial to sort it in a minimal set of swaps.

Since the last 5 items are already sorted, the code can be written just to reposition the first 2 items. Since you're using Pascal, I've written and tested a sorting algorithm that can execute 2,118,760 times in about 62 milliseconds.

procedure SortT7Cards(var Cards: T7Cards);
const
  CardsLength = Length(Cards);
var
  I, J, V: Integer;
  V1, V2: Integer;
begin
  // Last 5 items will always be sorted, so we want to place the first two into
  // the right location.
  V1 := Cards[0];
  V2 := Cards[1];
  if V2 < V1 then
  begin
    I := V1;
    V1 := V2;
    V2 := I;
  end;

  J := 0;
  I := 2;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V1 < V then
    begin
      Cards[J] := V1;
      Inc(J);
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  while I < CardsLength do
  begin
    V := Cards[I];
    if V2 < V then
    begin
      Cards[J] := V2;
      Break;
    end;
    Cards[J] := V;
    Inc(J);
    Inc(I);
  end;
  if J = (CardsLength - 2) then
  begin
    Cards[J] := V1;
    Cards[J + 1] := V2;
  end
  else if J = (CardsLength - 1) then
  begin
    Cards[J] := V2;
  end;
end;

Use min-sort. Search for minimal and maximal element at once and place them into resultant array. Repeat three times. (EDIT: No, I won't try to measure the speed theoretically :_))

var
cards,result: array[0..6] of integer;
i,min,max: integer;

begin
   n=0;
   while (n<3) do begin
      min:=-1;
      max:=52;
      for i from 0 to 6 do begin
          if cards[i]<min then min:=cards[i]
          else if cards[i]>max then max:=cards[i]
      end
      result[n]:=min;
      result[6-n]:=max;
      inc(n);
   end
   for i from 0 to 6 do 
       if (cards[i]<52) and (cards[i]>=0) then begin
           result[3] := cards[i];
           break;
       end
    { Result is sorted here! }
end

This is the fastest method: since the 5-card list is already sorted, sort the two-card list (a compare & swap), and then merge the two lists, which is O(k * (5+2). In this case (k) will normally be 5: the loop test(1), the compare(2), the copy(3), the input-list increment(4) and the output list increment(5). That's 35 + 2.5. Throw in loop initialization and you get 41.5 statements, total.

You could also unroll the loops which would save you maybe 8 statements or execution, but make the whole routine about 4-5 times longer which may mess with your instruction cache hit ratio.

Given P(0 to 2), C(0 to 5) and copying to H(0 to 6) with C() already sorted (ascending):

If P(0) > P(1) Then
    // Swap:
    T = P(0)
    P(0) = P(1)
    P(1) = T
    // 1stmt + (3stmt * 50%) = 2.5stmt
End

P(2), C(5) = 53    \\ Note these are end-of-list flags
k = 0     \\ P() index
J = 0     \\ H() index
i = 0     \\ C() index
// 4 stmt

Do While (j) < 7 
    If P(k) < C(I) then
        H(j) = P(k)
        k = k+1
    Else
        H(j) = C(i)
        j = j+1
    End if
    j = j+1
    // 5stmt * 7loops = 35stmt
Loop

And note that this is faster than the other algorithm that would be "fastest" if you had to truly sort all 7 cards: use a bit-mask(52) to map & bit-set all 7 cards into that range of all possible 52 cards (the bit-mask), and then scan the bit-mask in order looking for the 7 bits that are set. That takes 60-120 statements at best (but is still faster than any other sorting approach).

For seven numbers, the most efficient algorithm that exists with regards to the number of comparisons is Ford-Johnson's. In fact, wikipedia references a paper, easily found on google, that claims Ford-Johnson's the best for up to 47 numbers. Unfortunately, references to Ford-Johnson's aren't all that easy to found, and the algorithm uses some complex data structures.

It appears on The Art Of Computer Programming, Volume 3, by Donald Knuth, if you have access to that book.

There's a paper which describes FJ and a more memory efficient version here.

At any rate, because of the memory overhead of that algorithm, I doubt it would be worth your while for integers, as the cost of comparing two integers is rather cheap compared to the cost of allocating memory and manipulating pointers.

Now, you mentioned that 5 cards are already sorted, and you just need to insert two. You can do this with insertion sort most efficiently like this:

Order the two cards so that P1 > P2
Insert P1 going from the high end to the low end
(list) Insert P2 going from after P1 to the low end
(array) Insert P2 going from the low end to the high end

How you do that will depend on the data structure. With an array you'll be swapping each element, so place P1 at 1st, P2 and 7th (ordered high to low), and then swap P1 up, and then P2 down. With a list, you just need to fix the pointers as appropriate.

However once more, because of the particularity of your code, it really is best if you follow nikie suggestion and just generate the for loops apropriately for every variation in which P1 and P2 can appear in the list.

For example, sort P1 and P2 so that P1 < P2. Let's make Po1 and Po2 the position from 0 to 6, of P1 and P2 on the list. Then do this:

Loop Po1 from 0 to 5
Loop Po2 from Po1 + 1 to 6
If (Po2 == 1) C1start := P2 + 1; C1end := 51 - 4
If (Po1 == 0 && Po2 == 2) C1start := P1+1; C1end := P2 - 1
If (Po1 == 0 && Po2 > 2) C1start := P1+1; C1end := 51 - 5
If (Po1 > 0) C1start := 0; C1end := 51 - 6
for C1 := C1start to C1end
  // Repeat logic to compute C2start and C2end
  // C2 can begin at C1+1, P1+1 or P2+1
  // C2 can finish at P1-1, P2-1, 51 - 3, 51 - 4 or 51 -5
  etc

You then call a function passing Po1, Po2, P1, P2, C1, C2, C3, C4, C5, and have this function return all possible permutations based on Po1 and Po2 (that's 36 combinations).

Personally, I think that's the fastest you can get. You completely avoid having to order anything, because the data will be pre-ordered. You incur in some comparisons anyway to compute the starts and ends, but their cost is minimized as most of them will be on the outermost loops, so they won't be repeated much. And they can even be more optimized at the cost of more code duplication.

For 7 elements, there are only few options. You can easily write a generator that produces method to sort all possible combinations of 7 elements. Something like this method for 3 elements:

if a[1] < a[2] {
    if a[2] < a[3] {
        // nothing to do, a[1] < a[2] < a[3]
    } else {
         if a[1] < a[3] {
             // correct order should be a[1], a[3], a[2]
             swap a[2], a[3]
         } else {
             // correct order should be a[3], a[1], a[2]
             swap a[2], a[3]
             swap a[1], a[3]
         }
    }
} else {
    // here we know that a[1] >= a[2]
    ...
}

Of course method for 7 elements will be bigger, but it's not that hard to generate.

The code below is close to optimal. It could be made better by composing a list to be traversed while making the tree, but I'm out of time right now. Cheers!

object Sort7 {
  def left(i: Int) = i * 4
  def right(i: Int) = i * 4 + 1
  def up(i: Int) = i * 4 + 2
  def value(i: Int) = i * 4 + 3

  val a = new Array[Int](7 * 4)
  def reset = {
    0 until 7 foreach { 
      i => {
        a(left(i)) = -1
        a(right(i)) = -1
        a(up(i)) = -1
        a(value(i)) = scala.util.Random.nextInt(52)
      }
    }
  }

  def sortN(i : Int) {
    var index = 0
    def getNext = if (a(value(i)) < a(value(index))) left(index) else right(index)
    var next = getNext
    while(a(next) != -1) {
      index = a(next)
      next = getNext
    }
    a(next) = i
    a(up(i)) = index
  }

  def sort = 1 until 7 foreach (sortN(_))

  def print {
    traverse(0)
    def traverse(i: Int): Unit = {
      if (i != -1) {
        traverse(a(left(i)))
        println(a(value(i)))
        traverse(a(right(i)))
      }
    }
  }
}

In pseudo code:

int64 temp = 0;
int index, bit_position;

for index := 0 to 6 do
   temp |= 1 << cards[index];

for index := 0 to 6 do
begin
   bit_position = find_first_set(temp);
   temp &= ~(1 << bit_position);
   cards[index] = bit_position;
end;

It's an application of bucket sort, which should generally be faster than any of the comparison sorts that were suggested.

Note: The second part could also be implemented by iterating over bits in linear time, but in practice it may not be faster:

index = 0;
for bit_position := 0 to 51 do
begin
   if (temp & (1 << bit_position)) > 0 then
   begin
      cards[index] = bit_position;
      index++;
   end;
end;

Assuming that you need an array of cards at the end of it.

Map the original cards to bits in a 64 bit integer ( or any integer with >= 52 bits ).

If during the initial mapping the array is sorted, don't change it.

Partition the integer into nibbles - each will correspond to values 0x0 to 0xf.

Use the nibbles as indices to corresponding sorted sub-arrays. You'll need 13 sets of 16 sub-arrays ( or just 16 sub-arrays and use a second indirection, or do the bit ops rather than looking the answer up; what is faster will vary by platform ).

Concatenate the non-empty sub-arrays into the final array.

You could use larger than nibbles if you want; bytes would give 7 sets of 256 arrays and make it more likely that the non-empty arrays require concatenating.

This assumes that branches are expensive and cached array accesses cheap.

#include <stdio.h>
#include <stdbool.h>
#include <stdint.h>

// for general case of 7 from 52, rather than assuming last 5 sorted
uint32_t card_masks[16][5] = {
    { 0, 0, 0, 0, 0 },
    { 1, 0, 0, 0, 0 },
    { 2, 0, 0, 0, 0 },
    { 1, 2, 0, 0, 0 },
    { 3, 0, 0, 0, 0 },
    { 1, 3, 0, 0, 0 },
    { 2, 3, 0, 0, 0 },
    { 1, 2, 3, 0, 0 },
    { 4, 0, 0, 0, 0 },
    { 1, 4, 0, 0, 0 },
    { 2, 4, 0, 0, 0 },
    { 1, 2, 4, 0, 0 },
    { 3, 4, 0, 0, 0 },
    { 1, 3, 4, 0, 0 },
    { 2, 3, 4, 0, 0 },
    { 1, 2, 3, 4, 0 },
};

void sort7 ( uint32_t* cards) {
    uint64_t bitset = ( ( 1LL << cards[ 0 ] ) | ( 1LL << cards[ 1LL ] ) | ( 1LL << cards[ 2 ] ) | ( 1LL << cards[ 3 ] ) | ( 1LL << cards[ 4 ] ) | ( 1LL << cards[ 5 ] ) | ( 1LL << cards[ 6 ] ) ) >> 1;

    uint32_t*   p    = cards;
    uint32_t    base = 0;

    do {
        uint32_t* card_mask = card_masks[ bitset & 0xf ];

        // you might remove this test somehow, as well as unrolling the outer loop
        // having separate arrays for each nibble would save 7 additions and the increment of base
        while ( *card_mask )
            *(p++) = base + *(card_mask++);

        bitset >>= 4;
        base += 4;
    } while ( bitset );
}

void print_cards ( uint32_t* cards ) {
    printf ( "[ %d %d %d %d %d %d %d ]\n", cards[0], cards[1], cards[2], cards[3], cards[4], cards[5], cards[6] );
}

int main ( void ) {
    uint32_t cards[7] = { 3, 9, 23, 17, 2, 42, 52 };

    print_cards ( cards );
    sort7 ( cards );
    print_cards ( cards );

    return 0;
}

Take a look at this:

http://en.wikipedia.org/wiki/Sorting_algorithm

You would need to pick one that will have a stable worst case cost...

Another option could be to keep the array sorted the whole time, so an addition of a card would keep the array sorted automatically, that way you could skip to sorting...

What JRL is referring to is a bucket sort. Since you have a finite discrete set of possible values, you can declare 52 buckets and just drop each element in a bucket in O(1) time. Hence bucket sort is O(n). Without the guarantee of a finite number of different elements, the fastest theoretical sort is O(n log n) which things like merge sort an quick sort are. It's just a balance of best and worst case scenarios then.

But long answer short, use bucket sort.

If you like the above mentioned suggestion to keep a 52 element array which always keeps your array sorted, then may be you could keep another list of 7 elements which would reference the 7 valid elements in the 52 element array. This way we can even avoid parsing the 52 element array.

I guess for this to be really efficient, we would need to have a linked list type of structure which be supports operations: InsertAtPosition() and DeleteAtPosition() and be efficient at that.

There are a lot of loops in the answers. Given his speed requirement and the tiny size of the data set I would not do ANY loops.

I have not tried it but I suspect the best answer is a fully unrolled bubble sort. It would also probably gain a fair amount of advantage from being done in assembly.

I wonder if this is the right approach, though. How are you going to analyze a 7 card hand?? I think you're going to end up converting it to some other representation for analysis anyway. Would not a 4x13 array be a more useful representation? (And it would render the sorting issue moot, anyway.)

Considering that last 5 elements are always sorted:


for i := 0 to 1 do begin
  j := i;
  x := array[j];
  while (j+1 <= 6) and (array[j+1] < x) do begin
    array[j] := array[j+1];
    inc(j);
  end;
  array[j] := X;
end;

bubble sort is your friend. Other sorts have too many overhead codes and not suitable for small number of elements

Cheers

Here is your basic O(n) sort. I'm not sure how it compares to the others. It uses unrolled loops.

char card[7]; // the original table of 7 numbers in range 0..51
char table[52]; // workspace

// clear the workspace
memset(table, 0, sizeof(table));

// set the 7 bits corresponding to the 7 cards
table[card[0]] = 1;
table[card[1]] = 1;
...
table[card[6]] = 1;

// read the cards back out
int j = 0;
if (table[0]) card[j++] = 0;
if (table[1]) card[j++] = 1;
...
if (table[51]) card[j++] = 51;

If you are looking for a very low overhead, optimal sort, you should create a sorting network. You can generate the code for a 7 integer network using the Bose-Nelson algorithm.

This would guarentee a fixed number of compares and an equal number of swaps in the worst case.

The generated code is ugly, but it is optimal.

Your data is in a sorted array and I'll assume you swap the new two if needed so also sorted, so a. if you want to keep it in place then use a form of insertion sort; b. if you want to have it the result in another array do a merging by copying.

With the small numbers, binary chop is overkill, and ternary chop is appropriate anyway: One new card will mostly like split into two and three, viz. 2+3 or 3+2, two cards into singles and pairs, e.g. 2+1+2.

So the most time-space efficient approach to placing the smaller new card is to compare with a[1] (viz. skip a[0]) and then search left or right to find the card it should displace, then swap and move right (shifting rather than bubbling), comparing with the larger new card till you find where it goes. After this you'll be shifting forward by twos (two cards have been inserted). The variables holding the new cards (and swaps) should be registers.

The look up approach would be faster but use more memory.

Use a sorting network, like in this C++ code:

template<class T> 
inline void sort7(T data) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
//DD = Define Data, create a local copy of the data to aid the optimizer.
#define DD1(a)   register auto data##a=*(data+a);
#define DD2(a,b) register auto data##a=*(data+a);register auto data##b=*(data+b);
//CB = Copy Back
#define CB1(a)   *(data+a)=data##a;
#define CB2(a,b) *(data+a)=data##a;*(data+b)=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(3,4) SORT2(3,4)
  DD2(5,6) SORT2(5,6)
  DD1(0) SORT2(0,2)
  SORT2(3,5) 
  SORT2(4,6) 
  SORT2(0,1)
  SORT2(4,5) 
  SORT2(2,6) CB1(6)
  SORT2(0,4) 
  SORT2(1,5)
  SORT2(0,3) CB1(0) 
  SORT2(2,5) CB1(5)
  SORT2(1,3) CB1(1) 
  SORT2(2,4) CB1(4)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}

Use the function above if you want to pass it an iterator or a pointer and use the function below if you want to pass it the seven arguments one by one. BTW, using templates allows compilers to generate really optimized code so don't get ride of the template<> unless you want C code (or some other language's code).

template<class T>
inline void sort7(T& e0, T& e1, T& e2, T& e3, T& e4, T& e5, T& e6) {
#define SORT2(x,y) {if(data##x>data##y)std::swap(data##x,data##y);}
#define DD1(a)   register auto data##a=e##a;
#define DD2(a,b) register auto data##a=e##a;register auto data##b=e##b;
#define CB1(a)   e##a=data##a;
#define CB2(a,b) e##a=data##a;e##b=data##b;
  DD2(1,2) SORT2(1,2)
  DD2(3,4) SORT2(3,4)
  DD2(5,6) SORT2(5,6)
  DD1(0) SORT2(0,2)
  SORT2(3,5)
  SORT2(4,6)
  SORT2(0,1)
  SORT2(4,5)
  SORT2(2,6) CB1(6)
  SORT2(0,4)
  SORT2(1,5)
  SORT2(0,3) CB1(0)
  SORT2(2,5) CB1(5)
  SORT2(1,3) CB1(1)
  SORT2(2,4) CB1(4)
  SORT2(2,3) CB2(2,3)
#undef CB1
#undef CB2
#undef DD1
#undef DD2
#undef SORT2
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top