Reduce Permutation
-
20-08-2019 - |
Question
I need an algorithm that can map the runs in a permutation to a single number, but also reduce the subsequent numbers.
So a run is a sequential set of numbers in a permutation that is sorted and in-order. In the list, 1;2;3;5;6;4 there are two runs, 1;2;3 and 5;6. I want to replace these with a single number, a minimum, so if, after removing runs, we have a permutation of 4 elements, the permutation uses the numbers 1 ... 4. In the above, we have to reduce the two runs. the first run would be 1, 4 transforms to 2, and [5;6] transforms to 3, to hold the second criteria. If I sort the reduced permutation then expand the elements inside from the mapping, and sort the original permutation I will get the same result. The resultant permutation shouldn't have any runs in it.
For example (i've highlighted the runs):
(1;2;3;4;5;6) => 1 //Mappings: 1->1;2;3;4;5;6
(2;3;4);1;(5;6) => 2 1 3 // Mappings: 2->2;3;4, and 3->5;6
(3;4);(1;2);(5;6) => 2 1 3 // Mappings: 2->3;4, and 1->1;2 and 3->5;6
for now, I am passing over the list and making a list of lists, grouping the runs. Really the second part is the hard part to make a clean solution. I have thought of the naive approach, curious if anyone has some clever trick that can do it better then mine, I'm at like O( 2n + n log n),
- replacing the runs with the head element of the run and inserting that data into a hashtable (for recoverability)
- sorting to create a map to the missing digits with it's sorted index. [1;6;5;4] would output [(1,1);(4,2);(5,3);(6,4)]
- replacing the list in step1 with the map created in step2 and updating the hashtable for translation.
running through an example, again:
step 1: replace runs with the head element of the run and inserting data into a hash table [1;3;4;2;5;6;] -> [1;3;2;5] step 2: sort array to create map [1235], so we know that, in the previous array, 1 maps to 1, 2 to 2, 3 to 3, 4 to 5. step 3: do above translation on array from step one. [1;3;2;4]
If I sort this permutation and reconstruct: [1;2;3;4], 3->3;4 and 4->5;6 I get, 1;2;3;4;5;6. Also sorted.
I'm using lists, so a functional approach would be preferred. No code necessary. All ideas, of course, welcome.
EDIT:
mweerden:
It's not clear to me what the precise conditions on the mapping are. Why exactly isn't it allowed to just produce the permutation [1,2,...,N] for a permutation with N runs? You seem to prefer to map a run to a number from that run, but (as this isn't always possible) you seem to allow some freedom. –
I don't prefer to map a run to a number within that run (look above!), but I need to preserve an ordering. The permutation [1;2;3...N] is a run, and thus can be reduced further. That is why it is invalid. The order will matter in further operations in another algorithm, but the individual elements can be expanded at the end to rescue the original permutation.
Solution
Notation:
- counting starts at 1
l.i
is elementi
of listl
l+m
is the concatenation of listsl
andm
- a run is a maximal sublist that is an list
[n,n+1,n+2,...,m]
for somen
andm
withn <= m
So we are given a permutation p
of the list [1,2,...,N]
. We divide p
up into runs r_1,r_2,...,r_M
such that p = r_1+r_2+...+r_M
. We are then looking for a permutation q
of [1,2,...,M]
such that r_(q.1)+r_(q.2)+...+r_(q.M) = [1,2,...,N]
.
For example, if p = [1,3,4,2,5,6]
, we have N=6
, M=4
, r_1 = [1]
, r_2 = [3,4]
, r_3 = [2]
and r_4 = [5,6]
. In this case we need q
to be [1,3,2,4]
as r_1+r_3+r_2+r_4 = [1]+[2]+[3,4]+[5,6] = [1,2,3,4,5,6]
.
Note that q
cannot contain runs of length greater than one per definition; if it would, then there is an i < M
with q.i + 1 = q.(i+1)
and thus a sublist r_(q.i)+r_(q.(i+1)) = r_(q.i)+r_(q.i + 1)
of [1,2,...,N]
, but r_(q.i)+r_(q.i + 1)
is also a sublist of p
which contradicts that r_(q.i)
and r_(q.i + 1)
are runs.
Now, to find a q
given a p
, we assume the availability of a mapping data structure with O(1)
inserts and lookups of numbers and lists with O(1)
appends and forward traversal. Then we do the following.
First we construct the list
runheads = [r_1.1,r_2.1,...,r_M.1]
. This can be done trivially by traversingp
whilst maintaining the current run. During this step we also remember the maximal number encountered to obtainN
at the end and construct a mappingheads
with the elements ofrunheads
as keys. This step is clearlyO(n)
. Note that it is not relevant what the values ofheads
are, so we can just use runr_i
as value for keyr_i.1
.Next we iterate from
1
to (and including)N
maintaining a valuek
with initial value1
. For each valuei
we check to see ifi
is inheads
. If this is the case we addi -> k
to a mappingf
and increasek
. This step is also clearlyO(n)
. To be able to get back fromq
top
we can also store an additional mappingrev
withk -> i
or evenk -> runheads(i)
at no extra big-O cost.Finally we apply mapping
f
to the elements ofrunheads
to getq
. AgainO(n)
.
To illustrate with an example we look at the case that p = [1,3,4,2,5,6]
.
In the first step we get
runheads = [1,3,2,5]
,N=6
andheads = { 1 -> [1], 3 -> [3,4], 2 -> [2], 5 -> [5,6] }
.For the second steps we four cases for which we have to do something:
1
,2
,3
and5
. For these cases we have values fork
that are1
,2
,3
and4
, respectively. This means thatf
will be{ 1 -> 1, 2 -> 2, 3 -> 3, 5 -> 4 }
. (Andrev
would be{ 1 -> 1, 2 -> 2, 3 -> 3, 4 -> 5 }
or{ 1 -> [1], 2 -> [2], 3 -> [3,4], 4 -> [5,6] }
, depending on what you chose to store.)To get
q
we calculatemap(f,runheads)
which is[f(1),f(3),f(2),f(5)]
or, equivalently,[1,3,2,4]
.
So, if I didn't make a mistake and if the data structures satisfy the above requirements, this solution should be O(n)
. Note that in practice it might actually be more useful to use your own (O(n*log(n))
) solution. If you have a big p
but with just a couple of runs, sorting runheads
and using that to build the mappings might be more efficient. I do think that p
would have to be quite big for this to be the case.
OTHER TIPS
Edited for clarificaton
step 1: Run the algorithm but instead of producing only one hash table produce a Hash Table (D1) indexed by the head of the set it is mapping to (for example, for [3,4] that will be 3) and a List (L1) with the set itself
[3;4;6;8;1;2]:
D1 L1
3 -> [3,4] 1 -> [3,4]
6 -> [6] 2 -> [6]
8 -> [8] 3 -> [8]
1 -> [1,2] 4 -> [1,2]
Step 2: I you look at the collections we now have you'll see that, for a given set we have the index in which we found it (in L1) and the Head value. The correct map value will be the minimum integer between them which is not already taken. For example, for the [3,4] we'll have that the value must be between 1 and 3, and, since 1 is already taken, the corresponding value is 2. Take into account that, as D1 is indexed by the Head value, lower values will be always taking if the corresponding set exists. In the example, [1,2] is mapped to 1, so that this key is already "taken". So, in pseudocode:
for (int Current = 1; Current < L1.Length; Current++)
{
GetHead(L1[Current]);
Index = Current;
While Head > Index
{
if D1.Empty(Index)
{
D1.Add(Index,D2[Current])
D1.DeleteIfNotEmpty(Head);
}
else
Index++;
}
}
For example
- we take the first value in L1 -> [3,4]...
- get the Head = 3
- Starting on 1 we lookup D1[1] which is already taken, so we increment to 2.
- We look for D1[2] which is empty so we assign D1[2] = [3,4] and delete D[3]
That does not provide O(n) but something like O(n+log(n)) (n for the first step, log(n) for the second)
For the example above that gets you:
1 -> [1,2]
2 -> [3,4]
3 -> [6]
4 -> [8]
Now, if you have [3;4;8;6;1;2], that will result in
1 -> [1,2]
2 -> [3,4]
3 -> [8]
4 -> [6]
because it is using absolute ordering in the original array, I don't know if that is all right or you'll want 6 to be in index 3 and 8 to be in index 4, in that case you will probably had to preorder L1 based on the head which will increment your complexity by Log(n)
If you have to preorder you'll have 0(n+log^2(n)) which isn't so bad (maybe less assuming a QuickSort has O(Log n) ordering L1 will be O(log(log n))