How to effectively store a large set of permutations?

https://stackoverflow.com/questions/8717375

13-04-2021
|

Frage

Let's say we have a list of elements:

[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]

I would like to store all possible permutations of this list in the RAM.

Since the list can be pretty long (10 elements and more), it takes a lot of space to store it (factorial N).

For instance, if I have a list, which consumes about 70 bytes of space and has 12 elements, then I need 12! * 70 ~ 31 GB. If I add just one more element to the list, then it could become unfeasible to store the permutations in the RAM.

Is there any more efficient representation to keep all the permutations in the memory than the following Erlang representation?

[{dog,1},{dog,2},{cat,1},{cat,2},{bird,1},{bird,2},...]

(I know that the atom dog is stored only once in the atoms table, but since it is repeated in every permutation, it takes N memory).

Maybe these permutations could be stored in some kind of byte representation? (Sorry, I am a newbie in bytes and binaries).

After all, it is just the same elements, but rearranged in different ways.

Lösung

Why not produce them lazily? Keep an index from each list, and when asked for a new element, you produce the combination on the fly. That way, you only need to store the initial list of source elements in memory plus indices at any time.

For example (if you need to iterate over the permutations):

-record(perm, {list_a, list_b, index_a, index_b}).

Everytime you reach the maximum of index_b, you reset it to 0 and increment index_a with one. Then, accessing the Nth element of the lists (where N is the indices) you can recreate any permutation instance.

Of course, this implies that you would have to traverse the lists each time a permutation is produced. To avoid this, you could use the lists as indices themselves:

-record(perm2, {list_a, list_b, list_b_orig}).

To generate the next permutation, pop the new element from list_b and append it to the head of list_a. If list_b is empty, remove the head of list_a and start over by setting list_b to the original which is saved in list_b_orig.

Andere Tipps

If you have a List of N elements, there are N! permutations. So if we are able to produce a mapping from the numbers 1 to N! (or 0 to N!-1) to these permutations in a reproducable way, we wouldn't need to store N! lists of elements, but only N! numbers.

But stop - why should we store N! numbers? We don't need to store them, because they don't change. We only need the upper bound, which is defined by the biggest element index, which is N, which should be stored already in your code.

I'm sorry not to know Erlang, but I wrote a functional algorithm in Scala which allows to produce permutations of arbitrary size in a reproducable manner.

For example, the 123456790th permutation of the numbers (1 to 12) is List (4, 2, 1, 5, 12, 7, 10, 8, 11, 9, 3, 6).

As a special bonus, this algorithm produces the permutations in a sorted way. Just finding all permutations in reproduceable way but without order is more simple:

def permutationIndex (idx: Int, list: List [Int]) : List [Int] = {
  if (list.isEmpty) list else {
    val el = list (idx % list.size) 
    el :: permutationIndex (idx / list.size, list.remove (_ == el))}}

Maybe compressing it would do the job?

Zlib module seems to do something like this.

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow