Reduce Permutation

https://stackoverflow.com/questions/480397

20-08-2019
|

Question

I need an algorithm that can map the runs in a permutation to a single number, but also reduce the subsequent numbers.

So a run is a sequential set of numbers in a permutation that is sorted and in-order. In the list, 1;2;3;5;6;4 there are two runs, 1;2;3 and 5;6. I want to replace these with a single number, a minimum, so if, after removing runs, we have a permutation of 4 elements, the permutation uses the numbers 1 ... 4. In the above, we have to reduce the two runs. the first run would be 1, 4 transforms to 2, and [5;6] transforms to 3, to hold the second criteria. If I sort the reduced permutation then expand the elements inside from the mapping, and sort the original permutation I will get the same result. The resultant permutation shouldn't have any runs in it.

For example (i've highlighted the runs):

(1;2;3;4;5;6) => 1 //Mappings: 1->1;2;3;4;5;6
(2;3;4);1;(5;6) => 2 1 3 // Mappings: 2->2;3;4, and 3->5;6
(3;4);(1;2);(5;6) => 2 1 3 // Mappings: 2->3;4, and 1->1;2 and 3->5;6

for now, I am passing over the list and making a list of lists, grouping the runs. Really the second part is the hard part to make a clean solution. I have thought of the naive approach, curious if anyone has some clever trick that can do it better then mine, I'm at like O( 2n + n log n),

replacing the runs with the head element of the run and inserting that data into a hashtable (for recoverability)
sorting to create a map to the missing digits with it's sorted index. [1;6;5;4] would output [(1,1);(4,2);(5,3);(6,4)]
replacing the list in step1 with the map created in step2 and updating the hashtable for translation.

running through an example, again:

step 1: replace runs with the head element of the run and inserting data into a hash table  
    [1;3;4;2;5;6;] -> [1;3;2;5]  
step 2: sort array to create map  
    [1235], so we know that, in the previous array, 1 maps to 1, 2 to 2, 3 to 3, 4 to 5.  
step 3: do above translation on array from step one. 
    [1;3;2;4]

If I sort this permutation and reconstruct: [1;2;3;4], 3->3;4 and 4->5;6 I get, 1;2;3;4;5;6. Also sorted.

I'm using lists, so a functional approach would be preferred. No code necessary. All ideas, of course, welcome.

EDIT:

mweerden:

It's not clear to me what the precise conditions on the mapping are. Why exactly isn't it allowed to just produce the permutation [1,2,...,N] for a permutation with N runs? You seem to prefer to map a run to a number from that run, but (as this isn't always possible) you seem to allow some freedom. –

I don't prefer to map a run to a number within that run (look above!), but I need to preserve an ordering. The permutation [1;2;3...N] is a run, and thus can be reduced further. That is why it is invalid. The order will matter in further operations in another algorithm, but the individual elements can be expanded at the end to rescue the original permutation.

Solution

Notation:

counting starts at 1
l.i is element i of list l
l+m is the concatenation of lists l and m
a run is a maximal sublist that is an list [n,n+1,n+2,...,m] for some n and m with n <= m

So we are given a permutation p of the list [1,2,...,N]. We divide p up into runs r_1,r_2,...,r_M such that p = r_1+r_2+...+r_M. We are then looking for a permutation q of [1,2,...,M] such that r_(q.1)+r_(q.2)+...+r_(q.M) = [1,2,...,N].

For example, if p = [1,3,4,2,5,6], we have N=6, M=4, r_1 = [1], r_2 = [3,4], r_3 = [2] and r_4 = [5,6]. In this case we need q to be [1,3,2,4] as r_1+r_3+r_2+r_4 = [1]+[2]+[3,4]+[5,6] = [1,2,3,4,5,6].

Note that q cannot contain runs of length greater than one per definition; if it would, then there is an i < M with q.i + 1 = q.(i+1) and thus a sublist r_(q.i)+r_(q.(i+1)) = r_(q.i)+r_(q.i + 1) of [1,2,...,N], but r_(q.i)+r_(q.i + 1) is also a sublist of p which contradicts that r_(q.i) and r_(q.i + 1) are runs.

Now, to find a q given a p, we assume the availability of a mapping data structure with O(1) inserts and lookups of numbers and lists with O(1) appends and forward traversal. Then we do the following.

First we construct the list runheads = [r_1.1,r_2.1,...,r_M.1]. This can be done trivially by traversing p whilst maintaining the current run. During this step we also remember the maximal number encountered to obtain N at the end and construct a mapping heads with the elements of runheads as keys. This step is clearly O(n). Note that it is not relevant what the values of heads are, so we can just use run r_i as value for key r_i.1.
Next we iterate from 1 to (and including) N maintaining a value k with initial value 1. For each value i we check to see if i is in heads. If this is the case we add i -> k to a mapping f and increase k. This step is also clearly O(n). To be able to get back from q to p we can also store an additional mapping rev with k -> i or even k -> runheads(i) at no extra big-O cost.
Finally we apply mapping f to the elements of runheads to get q. Again O(n).

To illustrate with an example we look at the case that p = [1,3,4,2,5,6].

In the first step we get runheads = [1,3,2,5], N=6 and heads = { 1 -> [1], 3 -> [3,4], 2 -> [2], 5 -> [5,6] }.
For the second steps we four cases for which we have to do something: 1, 2, 3 and 5. For these cases we have values for k that are 1, 2, 3 and 4, respectively. This means that f will be { 1 -> 1, 2 -> 2, 3 -> 3, 5 -> 4 }. (And rev would be { 1 -> 1, 2 -> 2, 3 -> 3, 4 -> 5 } or { 1 -> [1], 2 -> [2], 3 -> [3,4], 4 -> [5,6] }, depending on what you chose to store.)
To get q we calculate map(f,runheads) which is [f(1),f(3),f(2),f(5)] or, equivalently, [1,3,2,4].

So, if I didn't make a mistake and if the data structures satisfy the above requirements, this solution should be O(n). Note that in practice it might actually be more useful to use your own (O(n*log(n))) solution. If you have a big p but with just a couple of runs, sorting runheads and using that to build the mappings might be more efficient. I do think that p would have to be quite big for this to be the case.

OTHER TIPS

Edited for clarificaton

step 1: Run the algorithm but instead of producing only one hash table produce a Hash Table (D1) indexed by the head of the set it is mapping to (for example, for [3,4] that will be 3) and a List (L1) with the set itself

[3;4;6;8;1;2]:

   D1              L1

3 -> [3,4]     1 -> [3,4]

6 -> [6]       2 -> [6]

8 -> [8]       3 -> [8]

1 -> [1,2]     4 -> [1,2]

Step 2: I you look at the collections we now have you'll see that, for a given set we have the index in which we found it (in L1) and the Head value. The correct map value will be the minimum integer between them which is not already taken. For example, for the [3,4] we'll have that the value must be between 1 and 3, and, since 1 is already taken, the corresponding value is 2. Take into account that, as D1 is indexed by the Head value, lower values will be always taking if the corresponding set exists. In the example, [1,2] is mapped to 1, so that this key is already "taken". So, in pseudocode:

for (int Current = 1; Current  < L1.Length; Current++)
{
  GetHead(L1[Current]);
  Index = Current;
  While Head > Index
  {
    if D1.Empty(Index)
    {
      D1.Add(Index,D2[Current])
      D1.DeleteIfNotEmpty(Head);
    }
    else
      Index++;
  }
}

For example

we take the first value in L1 -> [3,4]...
get the Head = 3
Starting on 1 we lookup D1[1] which is already taken, so we increment to 2.
We look for D1[2] which is empty so we assign D1[2] = [3,4] and delete D[3]

That does not provide O(n) but something like O(n+log(n)) (n for the first step, log(n) for the second)

For the example above that gets you:

1 -> [1,2]
2 -> [3,4]
3 -> [6]
4 -> [8]

Now, if you have [3;4;8;6;1;2], that will result in

1 -> [1,2]
2 -> [3,4]
3 -> [8]
4 -> [6]

because it is using absolute ordering in the original array, I don't know if that is all right or you'll want 6 to be in index 3 and 8 to be in index 4, in that case you will probably had to preorder L1 based on the head which will increment your complexity by Log(n)

If you have to preorder you'll have 0(n+log^2(n)) which isn't so bad (maybe less assuming a QuickSort has O(Log n) ordering L1 will be O(log(log n))

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow