In-place algorithm for interleaving an array

https://cs.stackexchange.com/questions/332

16-10-2019
|

Question

You are given an array of $2n$ elements

$$a_1, a_2, \dots, a_n, b_1, b_2, \dots b_n$$

The task is to interleave the array, using an in-place algorithm such that the resulting array looks like

$$b_1, a_1, b_2, a_2, \dots , b_n, a_n$$

If the in-place requirement wasn't there, we could easily create a new array and copy elements giving an $\mathcal{O}(n)$ time algorithm.

With the in-place requirement, a divide and conquer algorithm bumps up the algorithm to be $\theta(n \log n)$.

So the question is:

Is there an $\mathcal{O}(n)$ time algorithm, which is also in-place?

(Note: You can assume the uniform cost WORD RAM model, so in-place translates to $\mathcal{O}(1)$ space restriction).

Solution

Here is the answer which elaborates upon the algorithm from the paper linked by Joe: http://arxiv.org/abs/0805.1598

First let us consider a $\Theta(n \log n)$ algorithm which uses divide and conquer.

1) Divide and Conquer

We are given

$$a_1, a_2, \dots , b_1, b_2, \dots b_n$$

Now to use divide and conquer, for some $m = \Theta(n)$, we try to get the array

$$ [a_1, a_2, \dots , a_m, b_1, b_2, \dots, b_m], [a_{m+1}, \dots, a_n, b_{m+1}, \dots b_n]$$

and recurse.

Notice that the portion $$ b_1 , b_2, \dots b_m, a_{m+1}, \dots a_n$$ is a cyclic shift of

$$ a_{m+1}, \dots a_n, b_1 , \dots b_m$$

by $m$ places.

This is a classic and can be done in-place by three reversals and in $\mathcal{O}(n)$ time.

Thus the divide and conquer gives you a $\Theta(n \log n)$ algorithm, with a recursion similar to $T(n) = 2T(n/2) + \Theta(n)$.

2) Permutation Cycles

Now, another approach to the problem is the consider the permutation as a set of disjoint cycles.

The permutation is given by (assuming starting at $1$)

$$ j \mapsto 2j \mod 2n+1$$

If we somehow knew exactly what the cycles were, using constant extra space, we could realize the permutation by picking an element $A$, determine where that element goes (using the above formula), put the element in the target location into temporary space, put the element $A$ into that target location and continue along the cycle. Once we are done with one cycle we move onto an element of the next cycle and follow that cycle and so on.

This would give us an $\mathcal{O}(n)$ time algorithm, but it assumes that we "somehow knew what the exact cycles were" and trying to do this book-keeping within the $\mathcal{O}(1)$ space limitation is what makes this problem hard.

This is where the paper uses number theory.

It can be shown that, in the case when $2n + 1 = 3^k$, the elements at positions $1$, $3, 3^2, \dots, 3^{k-1}$ are in different cycles and every cycle contains an element at the position $3^m, m \ge 0$.

This uses the fact that $2$ is a generator of $(\mathbb{Z}/3^k)^*$.

Thus when $2n+1 = 3^k$, the follow the cycle approach gives us an $\mathcal{O}(n)$ time algorithm, as for each cycle, we know exactly where to begin: powers of $3$ (including $1$) (those can be computed in $\mathcal{O}(1)$ space).

3) Final Algorithm

Now we combine the above two: Divide and Conquer + Permutation Cycles.

We do a divide and conquer, but pick $m$ so that $2m+1$ is a power of $3$ and $m = \Theta(n)$.

So instead on recursing on both "halves", we recurse on only one and do $\Theta(n)$ extra work.

This gives us the recurrence $T(n) = T(cn) + \Theta(n)$ (for some $0 \lt c \lt 1$) and thus gives us an $\mathcal{O}(n)$ time, $\mathcal{O}(1)$ space algorithm!

OTHER TIPS

I'm pretty sure I've found an algorithm that does not rely on number theory or cycle theory. Note that there are a few details to work out (possibly tomorrow), but I'm quite confident they will work out. I handwave as I'm supposed to be sleeping, not because I'm trying to hide problems :)

Let A be the first array, B the second, |A| = |B| = N and assume N=2^k for some k, for simplicity. Let A[i..j] be the subarray of A with indices i through to j, inclusive. Arrays are 0-based. Let RightmostBitPos(i) return the (0-based) position of the rightmost bit that is '1' of i, counting from the right. The algorithm works as follows.

GetIndex(i) {
    int rightPos = RightmostBitPos(i) + 1;
    return i >> rightPos;
}

Interleave(A, B, N) {
    if (n == 1) {
        swap(a[0], b[0]);
    }
    else {
        for (i = 0; i < N; i++)
            swap(A[i], B[GetIndex(i+1)]);

        for (i = 1; i <= N/2; i*=2)
            Interleave(B[0..i/2-1], B[i/2..i-1], i/2);

        Interleave(B[0..N/2], B[N/2+1..N], n/2);
    }
}

Let's take an array of 16 numbers, and let's just start interleaving them using swaps, and see what happens:

1 2 3 4 5 6 7 8    | 9 10 11 12 13 14 15 16
9 2 3 4 5 6 7 8    | 1 10 11 12 13 14 15 16
9 1 3 4 5 6 7 8    | 2 10 11 12 13 14 15 16
9 1 10 4 5 6 7 8   | 2 3 11 12 13 14 15 16
9 1 10 2 5 6 7 8   | 4 3 11 12 13 14 15 16
9 1 10 2 11 6 7 8  | 4 3 5 12 13 14 15 16
9 1 10 2 11 3 7 8  | 4 6 5 12 13 14 15 16
9 1 10 2 11 3 12 8 | 4 6 5 7 13 14 15 16
9 1 10 2 11 3 12 4 | 8 6 5 7 13 14 15 16

Of particular interest is the first part of the second array:

The pattern should be clear: we alternately add a number to the end and replace the lowest number by a high number. Note that we always add a number that is one higher than the highest number we already have. If we were somehow able to figure out exactly which number is the lowest at any given time, we can do that easily.

Now, we go for bigger examples to see if we can see a pattern. Note that we don't need to fix the size of the array to construct the above example. At some point, we get this configuration (the second line subtracts 16 from every number):

16 24 20 28 18 22 26 30 17 19 21 23 25 27 29 31
0   8  4 12  2  6 10 14  1  3  5  7  9 11 13 15

Now this clearly shows a pattern: "1 3 5 7 9 11 13 15" are all 2 apart, "2 6 10 14" are all 4 apart and "4 12" are 8 apart. We can therefore devise an algorithm that tells us what the next smallest number will be: the mechanism is pretty much exactly how binary numbers work. You have a bit for the last half of the array, a bit for the second quarter, and so on.

If we are therefore allowed enough space to store these bits (we need $\log n$ bits, but our computational model allows this - a pointer into the array also needs $\log n$ bits), we can figure out which number to swap in $O(1)$ time amortized.

We can therefore get the first half of the array into its interleaved state in $O(n)$ time and $O(n)$ swaps. However, we do have to fix the second half of our array, which seems all messed up ("8 6 5 7 13 14 15 16").

Now, if we can 'sort' the first half of this second part, we end up with "5 6 7 8 13 14 15 16", and recursively interleaving this half will do the trick: we interleave the array in $O(n)$ time ($O(\log n)$ recursive calls each of which halve the input size). Note we don't need a stack as these calls are tail recursive, so our space usage remains $O(1)$.

Now, the question is: is there some pattern in the part we need to sort? Trying 32 numbers gives us "16 12 10 14 9 11 13 15" to fix up. Note that we have the exact same pattern here! "9 11 13 15", "10 14" and "12" are grouped together in the same fashion we saw earlier.

Now, the trick is to recursively interleave these subparts. We interleave "16" and "12" to "12 16". We interleave "12 16" and "10 14" to "10 12 14 16". We interleave "10 12 14 16" and "9 11 13 15" to "9 10 11 12 13 14 15 16". This sorts the first part.

Just like above, the total cost of this operation is $O(n)$. Adding all of these up, we still manage to get a total running time of $O(n)$.

An example:

Interleave the first half:
1 2 3 4 5 6 7 8    | 9 10 11 12 13 14 15 16
9 2 3 4 5 6 7 8    | 1 10 11 12 13 14 15 16
9 1 3 4 5 6 7 8    | 2 10 11 12 13 14 15 16
9 1 10 4 5 6 7 8   | 2 3 11 12 13 14 15 16
9 1 10 2 5 6 7 8   | 4 3 11 12 13 14 15 16
9 1 10 2 11 6 7 8  | 4 3 5 12 13 14 15 16
9 1 10 2 11 3 7 8  | 4 6 5 12 13 14 15 16
9 1 10 2 11 3 12 8 | 4 6 5 7 13 14 15 16
9 1 10 2 11 3 12 4 | 8 6 5 7 13 14 15 16
Sort out the first part of the second array (recursion not explicit):
8 6 5 7 13 14 15 16
6 8 5 7 13 14 15 16
5 8 6 7 13 14 15 16
5 6 8 7 13 14 15 16
5 6 7 8 13 14 15 16
Interleave again:
5 6 7 8   | 13 14 15 16
13 6 7 8  | 5 14 15 16
13 5 7 8  | 6 14 15 16
13 5 14 8 | 6 7 15 16
13 5 14 6 | 8 7 15 16
Sort out the first part of the second array:
8 7 15 16
7 8 15 16
Interleave again:
7 8 | 15 16
15 8 | 7 16
15 7 | 8 16
Interleave again:
8 16
16 8
Merge all the above:
9 1 10 2 11 3 12 4 | 13 5 14 6 | 15 7 | 16 8

Here is a non-recursive in-place in linear time algorithm to interleave two halves of an array with no extra storage.

The general idea is simple: Walk through the first half of the array from left to right, swapping the correct values into place. As you advance, the yet to be used left values get swapped into the space vacated by the right values. The only trick is figuring out how to pull them out again.

We start with an array of size N split into 2 nearly equal halves.
[ left_items | right_items ]
As we process it, it becomes
[ placed_items | remaining_left_items| swapped_left_items | remaining_right_items]

The swap space grows with the following pattern: A) grow the space by removing the adjacent right item and swapping in a new item from the left; B) swap the oldest item with a new item from the left. If the left items are numbered 1..N, this pattern looks like

step swapspace index changed
1    A: 1         0
2    B: 2         0
3    A: 2 3       1
4    B: 4 3       0     
5    A: 4 3 5     2
6    B: 4 6 5     1
7    A: 4 6 5 7   3
...

The sequence of which index changed is exactly OEIS A025480, which can be calculated with a simple process. This allows the swap location to be found given only the number of items added so far, which is also the index of the current item being placed.

That's all the info we need to populate the 1st half of the sequence in linear time.

When we get to the midpoint, the array will have three parts: [ placed_items | swapped_left_items | remaining_right_items] If we can unscramble the swapped items, we have reduced the problem to half the size, and can repeat.

To unscramble the swap space, we use the following property: A sequence built by N alternating append and swap_oldest operations will contain N/2 items where their ages are given by A025480(N/2)..A025480(N-1). (Integer division, smaller values are older).

For example, if the left half originally held the values 1..19, then the swap space would contain [16, 12, 10, 14, 18, 11, 13, 15, 17, 19]. A025480(9..18) is [2, 5, 1, 6, 3, 7, 0, 8, 4, 9], which is exactly the list of indexes of the items from oldest to newest.

So we can unscramble our swap space by advancing through it and swapping S[i] with S[ A(N/2 + i)]. This is also linear time.

The remaining complication is that eventually you will reach a position where the correct value should be at a lower index, but it has already been swapped out. It is easy to find the new location: just do the index calculation again to discover where the item was swapped to. It may be necessary to follow the chain a few steps until you find an unswapped location.

At this point, we have merged half the array, and maintained the order of the unmerged parts in the other half, with exactly N/2 + N/4 swaps. We can continue through the rest of the array for a total of N + N/4 + N/8 + .... swaps which is strictly less than 3N/2.

How to calculate A025480:
This is defined in OEIS as a(2n) = n, a(2n+1) = a(n). An alternate formulation isa(n) = isEven(n)? n/2 : a((n-1)/2). This leads to a simple algorithm using bitwise operations:

index_t a025480(index_t n){
    while (n&1) n=n>>1;
    return n>>1;  
}

This is an amortized O(1) operation over all possible values for N. (1/2 need 1 shift, 1/4 need 2, 1/8 need 3, ...). There is an even faster method which uses a small lookup table to find the position of the least significant zero bit.

Given that, here's an implementation in C:

static inline index_t larger_half(index_t sz) {return sz - (sz / 2); }
static inline bool is_even(index_t i) { return ((i & 1) ^ 1); }

index_t unshuffle_item(index_t j, index_t sz)
{
  index_t i = j;
  do {
    i = a025480(sz / 2 + i);
  }
  while (i < j);
  return i;
}

void interleave(value_t a[], index_t n_items)
{
  index_t i = 0;
  index_t midpt = larger_half(n_items);
  while (i < n_items - 1) {

    //for out-shuffle, the left item is at an even index
    if (is_even(i)) { i++; }
    index_t base = i;

    //emplace left half.
    for (; i < midpt; i++) {
      index_t j = a025480(i - base);
      SWAP(a + i, a + midpt + j);
    }

    //unscramble swapped items
    index_t swap_ct  = larger_half(i - base);
    for (index_t j = 0; j + 1 < swap_ct ; j++) {
      index_t k = unshuffle_item(j, i - base);
      if (j != k) {
        SWAP(a + midpt + j, a + midpt + k);
      }
    }
    midpt += swap_ct;
  }
}

This should be a fairly cache-friendly algorithm, since 2 of the 3 data locations are accessed sequentially and the amount of data being processed is strictly decreasing. This method can be turned from an out-shuffle to an in-shuffle by negating the is_even test at the start of the loop.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange