Constrained Sequence to Index Mapping

https://stackoverflow.com/questions/369417

21-08-2019
|

Question

I'm puzzling over how to map a set of sequences to consecutive integers.

All the sequences follow this rule:

A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1

I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.

Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution

1,1,1   0
1,1,2   1
1,2,1   2
1,2,2   3
1,2,3   4

The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.

these are different valid sequences

1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2

these are not

1,2,2,4
2,
1,1,2,3,5

Related to this

Solution

There is a natural sequence indexing, but no so easy to calculate.

Let look for A_n for n>0, since A_0 = 1.

Indexing is done in 2 steps.

Part 1:

Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.

On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.

Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.

Part 2:

Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.

It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....

Combining:

This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:

T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n

Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.

Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to

T(7,1) +         # for 1000000
2^2 * T(4,2) +   # for 001000
2^2 * 3 * T(2,3) # for 010

Optimizations:

This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).

I'm not 100% sure about upper formula, but it should be something like it.

With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)

OTHER TIPS

I think hash with out sorting should be the thing.

As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).

This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function

def valid_lines(lines):
    for line in lines:
        line = line.split(",")
        if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
            yield line

lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
    print valid_line

Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow