문제

In CLRS chapter 2 there is an exercise which asks whether the worst-case running time of insertion sort be improved to O(n lg n). I saw this question and found that it cannot be done.

The worst-case complexity cannot be improved but would by using memmove the real running time be better compared to individually moving the array elements?

Code for individually moving elements

void insertion_sort(int arr[], int length)
{
    /*
    Sorts into increasing order
    For decreasing order change the comparison in for-loop
    */
    for (int j = 1; j < length; j++)
    {
        int temp = arr[j];
        int k;
        for (k = j - 1; k >= 0 && arr[k] > temp; k--){
            arr[k + 1] = arr[k];
        }
        arr[k + 1] = temp;
    }
}

Code for moving elements by using memmove

void insertion_sort(int arr[], int length)
{
    for (int j = 1; j < length; j++)
    {
        int temp = arr[j];
        int k;
        for (k = j - 1; k >= 0 && arr[k] > temp; k--){
                ;
        }
        if (k != j - 1){
            memmove(&arr[k + 2], &arr[k + 1], sizeof(int) *(j - k - 2));
        }
        arr[k + 1] = temp;
    }
}

I couldn't get the 2nd one to run perfectly but that is an example of what I am thinking of doing.

Would there be any visible speed improvements by using memmove?

도움이 되었습니까?

해결책 3

It all depends on your compiler and other implementation details. It is true that memmove can be implemented in some tricky super-optimized way. But at the same time a smart compiler might be able to figure out what your per-element copying code is doing and optimize it in the same (or very similar) way. Try it and see for yourself.

다른 팁

The implementation behind memmove() might be more optimized in your C library. Some architectures have instructions for moving whole blocks of memory at once very efficiently. The theoretical running-time complexity won't be improved, but it may still run faster in real life.

memmove would be perfectly tuned to make maximum use of the available system resources (unique for each implementation, of course).

Here is a little quote from Expert C Programming - Deep C Secrets on the difference between using a loop and using memcpy (preceding it are two code snippets one copying a source into a destination using a for loop and another memcpy):

In this particular case both the source and destination use the same cache line, causing every memory reference to miss the cache and stall the processor while it waited for regular memory to deliver. The library memcpy() routine is especially tuned for high performance. It unrolls the loop to read for one cache line and then write, which avoids the problem. Using the smart copy, we were able to get a huge performance improvement. This also shows the folly of drawing conclusions from simple-minded benchmark programs.

This dates back from 1994 but it still illustrates how much better optimised the standard library functions are compared to anything you roll on your own. The loop case took around 7 seconds to run versus 1 for the memcpy.

While memmove will be only slightly slower than memcpy due to the assumptions it needs to make about the source and destination (in memcpy they cannot overlap) it should still be far superior to any standard loop.

Note that this does not affect complexity (as it's been pointed out by another poster). Complexity does not depend on having a bigger cache or an unrolled loop :)

As requested here are the code snippets (slightly changed):

#include <string.h>
#define DUMBCOPY for (i = 0; i < 65536; i++) destination[i] = source[i] 

#define SMARTCOPY memcpy(destination, source, 65536) 
int main() 
{ 
    char source[65536], destination[65536]; 
    int i, j; 
    for (j = 0; j < 100; j++) 
        DUMBCOPY; /* or put SMARTCOPY here instead */
    return 0;
} 

On my machine (32 bit, Linux Mint, GCC 4.6.3) I got the following times:

Using SMARTCOPY:

$ time ./a.out 
real    0m0.002s
user    0m0.000s
sys     0m0.000s

Using DUMBCOPY:

$ time ./a.out 
real    0m0.050s
user    0m0.036s
sys     0m0.000s

You can not beat memcpy with C implementation. Because it is written in asm and with a good algorithms.

If you write asm code for a specific cpu in mind, and develop a good algorithms considering cache, you may have a chance.

Standard library functions are so well optimized, it is always better to use them.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top