Question

I want to merge two arrays with sorted values into one. Since both source arrays are stored as succeeding parts of a large array, I wonder, if you know a way to merge them into the large storage. Meaning inplace merge.

All methods I found, need some external storage. They often require sqrt(n) temp arrays. Is there an efficient way without it?

I m using C#. Other languages welcome also. Thanks in advance!

Was it helpful?

Solution

AFAIK, merging two (even sorted) arrays does not work inplace without considerably increasing the necessary number of comparisons and moves of elements. See: merge sort. However, blocked variants exist, which are able to sort a list of length n by utilizing a temporary arrays of lenght sqrt(n) - as you wrote - by still keeping the number of operations considerably low.. Its not bad - but its also not "nothing" and obviously the best you can get.

For practical situations and if you can afford it, you better use a temporary array to merge your lists.

OTHER TIPS

If the values are stored as succeeding parts of a larger array, you just want to sort the array, then remove consecutive values which are equal.

void  SortAndDedupe(Array<T> a)
{
    // Do an efficient in-place sort
    a.Sort();
    // Now deduplicate
    int lwm = 0; // low water mark
    int hwm = 1; // High water mark
    while(hwm < a.length)
    {
        // If the lwm and hwm elements are the same, it is a duplicate entry.
        if(a[lwm] == a[hwm])
        {
            hwm++;
        }else{
            // Not a duplicate entry - move the lwm up
            // and copy down the hwm element over the gap.
            lwm++;
            if(lwm < hwm){
                a[lwm] = a[hwm];
            }
            hwm++;
        }
    }
    // New length is lwm
    // number of elements removed is (hwm-lwm-1)
}

Before you conclude that this will be too slow, implement it and profile it. That should take about ten minutes.

Edit: This can of course be improved by using a different sort rather than the built-in sort, e.g. Quicksort, Heapsort or Smoothsort, depending on which gives better performance in practice. Note that hardware architecture issues mean that the practical performance comparisons may very well be very different from the results of big O analysis.

Really you need to profile it with different sort algorithms on your actual hardware/OS platform.

Note: I am not attempting in this answer to give an academic answer, I am trying to give a practical one, on the assumption you are trying to solve a real problem.

Dont care about external storage. sqrt(n) or even larger should not harm your performance. You will just have to make sure, the storage is pooled. Especially for large data. Especially for merging them in loops. Otherwise, the GC will get stressed and eat up a considerable part of your CPU time / memory bandwidth.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top