Question

I need to compare a block of memory to a fixed value in C. Can I do this with memcmp? Something like:

memcmp (starting_address , fixed_value , num_byte)

I need fixed_value to be a fixed value not the starting address of a block.

  1. Writing the fixed value to an entire temporary memory block is not an option because I have limited space.
  2. Using a loop to write and check memory one by one is not an option because it's very slow.

If it's not possible can anyone tell me a solution that is as fast (or faster) than memcmp?

Thanks,

EDIT: Let's say I have 5GB of memory that holds 0's. And I'm trying to make sure they're all 0's. Is it safe to check the first byte of the block then do this:

memcmp (starting_address , starting_address + ONE_BYTE , FIVE_GB); ?

EDIT: This is why I need to use memcmp and not a user defined loop:

This code took 546 clock ticks to run:

memset(0x80000000 , 0x1 , 0x10000000);
memset(0x90000000 , 0x1 , 0x10000000);
memcmp(0x80000000 , 0x90000000 , 0x10000000);

vs this one that took 7669 clock ticks:

unsigned int i;
int flag = 0;
int *p = 0x80000000;
int *q = 0x90000000;
while(p < 0x90000000)
{
    if(*p++ != *q++)
    {
        flag = 1;
    }
}
Was it helpful?

Solution

I just tested this loop on my Mac, and it beats memcmp:

uint64_t *p = (uint64_t *)buffer1;
uint64_t compare;
memset(&compare, 1, sizeof compare);
for (i = 0; i < length/sizeof compare; i++)
{
    if (p[i] != compare)
        break;
}

Complete example code:

#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <time.h>
#include <stdlib.h>
#include <stdint.h>

// from: http://www.gnu.org/software/libc/manual/html_node/Elapsed-Time.html
void timeval_subtract(struct timeval *result, struct timeval *x, struct timeval *y)
{
    /* Perform the carry for the later subtraction by updating y. */
    if (x->tv_usec < y->tv_usec)
    {
        int nsec = (y->tv_usec - x->tv_usec) / 1000000 + 1;
        y->tv_usec -= 1000000 * nsec;
        y->tv_sec += nsec;
    }

    if (x->tv_usec - y->tv_usec > 1000000)
    {
        int nsec = (x->tv_usec - y->tv_usec) / 1000000;
        y->tv_usec += 1000000 * nsec;
        y->tv_sec -= nsec;
    }

    /* Compute the time remaining to wait. tv_usec is certainly positive. */
    result->tv_sec = x->tv_sec - y->tv_sec;
    result->tv_usec = x->tv_usec - y->tv_usec;
}

int main(int argc, char **argv)
{
    struct rusage before;
    struct rusage after;
    struct timeval diff;
    size_t i;

    size_t length = strtoull(argv[1], NULL, 0);

    char *buffer1 = malloc(length);
    char *buffer2 = malloc(length);

    printf("filling...");
    fflush(stdout);
    memset(buffer1, 1, length);
    memset(buffer2, 1, length);
    printf(" done\n");

    getrusage(RUSAGE_SELF, &before);
    uint64_t *p = (uint64_t *)buffer1;
    uint64_t compare;
    memset(&compare, 1, sizeof compare);
    for (i = 0; i < length/sizeof compare; i++)
    {
        if (p[i] != compare)
            break;
    }
    if (i == length/sizeof compare)
        i = 0;
    getrusage(RUSAGE_SELF, &after);

    printf("\nloop (returned %zu):\n", i);
    timeval_subtract(&diff, &after.ru_utime, &before.ru_utime);
    printf("User:   %ld.%06d s\n", diff.tv_sec, diff.tv_usec);

    timeval_subtract(&diff, &after.ru_stime, &before.ru_stime);
    printf("System: %ld.%06d s\n", diff.tv_sec, diff.tv_usec);

    getrusage(RUSAGE_SELF, &before);
    i = memcmp(buffer1, buffer2, length);
    getrusage(RUSAGE_SELF, &after);

    printf("\nmemcmp (returned %zu):\n", i);
    timeval_subtract(&diff, &after.ru_utime, &before.ru_utime);
    printf("User:   %ld.%06d s\n", diff.tv_sec, diff.tv_usec);

    timeval_subtract(&diff, &after.ru_stime, &before.ru_stime);
    printf("System: %ld.%06d s\n", diff.tv_sec, diff.tv_usec);

    return 0;
}

And run results:

$ make
clang -Wall -Wextra -Werror -O3 -g -o example example.c
./example 0x10000000
filling... done

loop (returned 0):
User:   0.024078 s
System: 0.000011 s

memcmp (returned 0):
User:   0.036752 s
System: 0.000017 s

Maybe you can do something similar?

Note: For those concerned about cache warming, I also tried with the memcmp before the loop and got the same results.

OTHER TIPS

One solution:

Create a buffer containing all the same value and compare against it iteratively. That way you get the advantage of an efficient memcmp implementation without having to write too much code:

static char val[4096]; // tune the size of the buffer if desired
/* at initialization: memset(val, 0x01, sizeof(val)) */

char *start, *ptr, *end;
// initialize start and end
for(ptr = start; ptr < end-sizeof(val); ptr += sizeof(val)) {
    if(memcmp(ptr, val, sizeof(val))
        goto not_all_val;
}
if(memcmp(ptr, val, end - ptr))
    goto not_all_val;

/* all val */
puts("all val");
return;

not_all_val:
puts("not all val");
return;

Performance comparison:

10000 iterations of memcmp(buf, buf2, 4000000) (two buffers of the same length, same as the first method in the question):

real    0m7.480s
user    0m7.375s
sys 0m0.103s

10000 iterations of character-by-character comparison over 4000000 bytes (same as the second method):

real    0m27.004s
user    0m26.908s
sys 0m0.094s

10000 iterations of the proposed method above over 4000000 bytes:

real    0m3.194s
user    0m3.151s
sys 0m0.042s

YMMV (I'm on a three-year-old Macbook Pro), but this method is twice as fast as comparing a complete buffer (likely due to superior cache performance), and nearly ten times as fast as character-by-character comparison.

memcmp with an address is the best option for comparing blocks of memory. Whether you used a block inline or used the memory address of a block, you'd still have to store the block somewhere.

You can create such a block at compile time with something like:

int block[] = {3, 1, 4, 1, 5, 9};

and then just use block in your memcmp.

If you're just wanting to ensure a block of memory is set to specific values, use the for loop solution. Any other solution you come up with is going to have to do the same thing, check the entire block.

An alternative, if it's a really huge block of memory and it's taking too long, is to remove the requirement altogether. By that, I mean re-engineer your algorithms so that it becomes unnecessary. Let's say you have a 1G block.

An example: don't set them all to zeroes. Instead only set the bit at the front you're actively using and maintain a separate length variable to indicate how much you're using.

A heavily optimised memcmp may outperform a user loop but you may also find that it doesn't, simply because it has to cater to the general case - your specific case of checking against zero may allow a compiler to introduce optimisations that defeat memcmp.

As with all optimisations, measure, don't guess!

One option would be to start with the source code for memcmp, and modify it to compare against a fixed buffer, iteratively. That way you will retain the optimizations built into memcmp, avoid the overhead of an external loop, and still achieve your goal. You could have a function like the following:

int memcmp2(const void *s1, size_t n1, const void *s2, size_t n2);

Where n1 is the size of buffer s1, and n2 that of s2.

If you have no control over who writes to that memory block, then there can't possibly exist a smart algorithm that would allow efficient comparison against a single value. You will need to iterate over the whole block and you can't skip even a single word. The only thing you can do is compare more data at once, possibly utilizing machine instructions that can process multiple words at once.

If you do have control over that memory and only you can write to it, then you can be smarter about determining what's in there. For example by "wasting" some space to hold a bit-pattern that determines which words are zeroes. For example, if your words are 32-bit, then you will have a separate memory block where you keep as many words that sum up to the same amount of bits as there are words in your actual memory block. In this case, this is going to cost you 1 byte per 32 bytes of usable memory, which isn't terrible. If you really do need byte-granularity, then the cost is much higher: 1 per 8. But you usually don't need that; you can narrow down the search once you've found a word that isn't zeroed, and search only in that one for the first non-zero byte.

If, after running memcmp(), why would you expect memory changes? If the memory belongs only to your process nothing will modify it. If this is shared memory the problem becomes very different.

As an alternate suggestion I was thinking of using memset() to put all of the memory a known value- which you've already done in less than 546 ticks.

The reason is: memset() will set memory to a known value in one pass - making a second pass thru the same memory to validate takes circa twice as long.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top