Domanda

how can I manage a huge list of 100+ million strings? How can i begin to work with such a huge list?

example large list:

cards = [
            "2s","3s","4s","5s","6s","7s","8s","9s","10s","Js","Qs","Ks","As"
            "2h","3h","4h","5h","6h","7h","8h","9h","10h","Jh","Qh","Kh","Ah"
            "2d","3d","4d","5d","6d","7d","8d","9d","10d","Jd","Qd","Kd","Ad"
            "2c","3c","4c","5c","6c","7c","8c","9c","10c","Jc","Qc","Kc","Ac"
           ]

from itertools import combinations

cardsInHand = 7
hands = list(combinations(cards,  cardsInHand))

print str(len(hands)) + " hand combinations in texas holdem poker"
È stato utile?

Soluzione

With lots and lots of memory. Python lists and strings are actually reasonably efficient, so provided you've got the memory, it shouldn't be an issue.

That said, if what you're storing are specifically poker hands, you can definitely come up with more compact representations. For example, you can use one byte to encode each card, which means you only need one 64 bit int to store an entire hand. You could then store these in a NumPy array, which would be significantly more efficient than a Python list.

For example:

>>> cards_to_bytes = dict((card, num) for (num, card) in enumerate(cards))
>>> import numpy as np
>>> hands = np.zeros(133784560, dtype='7int8') # 133784560 == 52c7
>>> for num, hand in enumerate(itertools.combinations(cards, 7)):
...     hands[num] = [cards_to_bytes[card] for card in hand]

And to speed up that last line a bit: hands[num] = map(cards_to_bytes.__getitem__, hand)

This will only require 7 * 133784560 = ~1gb of memory… And that could be cut down if you pack four cards into each byte (I don't know the syntax for doing that off the top of my head…)

Altri suggerimenti

If you just want to loop over all possible hands to count them or to find one with a certain property, there is no need to store them all in memory.

You can just use the iterator and not convert to a list:

from itertools import combinations

cardsInHand = 7
hands = combinations(cards,  cardsInHand)

n = 0
for h in hands:
    n += 1
    # or do some other stuff here

print n, "hand combinations in texas holdem poker."

85900584 hand combinations in texas holdem poker.

Another memory-less option which allow you to create a stream of data for processing however you like is to use generators. For example.

Print the total number of hands:

sum (1 for x in combinations(cards, 7))

Print the number of hands with the ace of clubs in it:

sum (1 for x in combinations(cards, 7) if 'Ac' in x)

There's often a trade-off between how long you spend coding and how long your code takes to run. If you're just trying to get something done quickly and don't expect it to run frequently, an approach like you're suggesting is fine. Just make the list huge -- if you don't have enough RAM, your system will churn virtual memory, but you'll probably get your answer faster than learning how to write a more sophisticated solution.

But if this is a system that you expect to be used on a regular basis, you should figure out something other than storing everything in RAM. An SQL database is probably what you want. They can be very complex, but because they are nearly ubiquitous there are plenty of excellent tutorials out there.

You might look to a well-documented framework like django which simplifies access to a database through an ORM layer.

My public domain OneJoker library has some combinatoric functions that would be handy. It has an Iterator class that can give you information about the set of combinations without storing them or even running though them. For example:

  import onejoker as oj
  deck = oj.Sequence(52)
  deck.fill()

  hands = oj.Iterator(deck, 5)    # I want combinations of 5 cards out of that deck

  t = hands.total                 # How many are there?
  r = hands.rank("AcKsThAd3c")    # At what position will this hand appear?
  h = hands.hand_at(1000)         # What will the 1000th hand be?

  for h in hands.all():           # Do something with all of them
     dosomething(h)               

You could use the Iterator.rank() function to reduce each hand to a single int, store those in a compact array, then use Iterator.hand_at() to produce them on demand.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top