Domanda

Can't seem to find the answer to this anywhere, How do I memset an array to the maximum value of the array's type? I would have thought memset(ZBUFFER,0xFFFF,size) would work where ZBUFFER is a 16bit integer array. Instead I get -1s throughout.

Also, the idea is to have this work as fast as possible (it's a zbuffer that needs to initialize every frame) so if there is a better way (and still as fast or faster), let me know.

edit: as clarification, I do need a signed int array.

È stato utile?

Soluzione

In C++, you would use std::fill, and std::numeric_limits.

#include <algorithm>
#include <iterator>
#include <limits>

template <typename IT>
void FillWithMax( IT first, IT last )
{
    typedef typename std::iterator_traits<IT>::value_type T;
    T const maxval = std::numeric_limits<T>::max();
    std::fill( first, last, maxval );
}

size_t const size=32;
short ZBUFFER[size];
FillWithMax( ZBUFFER, &ZBUFFER[0]+size );

This will work with any type.

In C, you'd better keep off memset that sets the value of bytes. To initialize an array of other types than char (ev. unsigned), you have to resort to a manual for loop.

Altri suggerimenti

-1 and 0xFFFF are the same thing in a 16 bit integer using a two's complement representation. You are only getting -1 because either you have declared your array as short instead of unsigned short. Or because you are converting the values to signed when you output them.

BTW your assumption that you can set something except bytes using memset is wrong. memset(ZBUFFER, 0xFF, size) would have done the same thing.

In C++ you can fill an array with some value with the std::fill algorithm.

std::fill(ZBUFFER, ZBUFFER+size, std::numeric_limits<short>::max());

This is neither faster nor slower than your current approach. It does have the benefit of working, though.

Don't attribute speed to language. That's for implementations of C. There are C compilers that produce fast, optimal machine code and C compilers that produce slow, inoptimal machine code. Likewise for C++. A "fast, optimal" implementation might be able to optimise code that seems slow. Hence, it doesn't make sense to call one solution faster than another. I'll talk about the correctness, and then I'll talk about performance, however insignificant it is. It'd be a better idea to profile your code, to be sure that this is in fact the bottleneck, but let's continue.

Let us consider the most sensible option, first: A loop that copies int values. It is clear just by reading the code that the loop will correctly assign SHRT_MAX to each int item. You can see a testcase of this loop below, which will attempt to use the largest possible array allocatable by malloc at the time.

#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
    size_t size = SIZE_MAX;
    volatile int *array = malloc(size);

    /* Allocate largest array */
    while (array == NULL && size > 0) {
        size >>= 1;
        array = malloc(size);
    }

    printf("Copying into %zu bytes\n", size);

    for (size_t n = 0; n < size / sizeof *array; n++) {
        array[n] = SHRT_MAX;
    }

    puts("Done!");
    return 0;
}

I ran this on my system, compiled with various optimisations enabled (-O3 -march=core2 -funroll-loops). Here's the output:

Copying into 1073741823 bytes
Done!

Process returned 0 (0x0)   execution time : 1.094 s
Press any key to continue.

Note the "execution time"... That's pretty fast! If anything, the bottleneck here is the cache locality of such a large array, which is why a good programmer will try to design systems that don't use so much memory... Well, then let us consider the memset option. Here's a quote from the memset manual:

The memset() function copies c (converted to an unsigned char) into each of the first n bytes of the object pointed to by s.

Hence, it'll convert 0xFFFF to an unsigned char (and potentially truncate that value), then assign the converted value to the first size bytes. This results in incorrect behaviour. I don't like relying upon the value SHRT_MAX to be represented as a sequence of bytes storing the value (unsigned char) 0xFFFF, because that's relying upon coincidence. In other words, the main problem here is that memset isn't suitable for your task. Don't use it. Having said that, here's a test, derived from the test above, which will be used to test the speed of memset:

#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
    size_t size = SIZE_MAX;
    volatile int *array = malloc(size);

    /* Allocate largest array */
    while (array == NULL && size > 0) {
        size >>= 1;
        array = malloc(size);
    }

    printf("Copying into %zu bytes\n", size);

    memset(array, 0xFFFF, size);

    puts("Done!");
    return 0;
}

A trivial byte-copying memset loop will iterate sizeof (int) times more than the loop in my first example. Considering that my implementation uses a fairly optimal memset, here's the output:

Copying into 1073741823 bytes
Done!

Process returned 0 (0x0)   execution time : 1.060 s
Press any key to continue.

These tests are likely to vary, however significantly. I only ran them once each to get a rough idea. Hopefully you've come to the same conclusion that I have: Common compilers are pretty good at optimising simple loops, and it's not worth postulating about micro-optimisations here.

In summary:

  1. Don't use memset to fill ints with values (with an exception for the value 0), because it's not suitable.
  2. Don't postulate about optimisations prior to running tests. Don't run tests until you have a working solution. By working solution I mean "A program that solves an actual problem". Once you have that, use your profiler to identify more significant opportunities to optimise!

This is because of two's complement. You have to change your array type to unsigned short, to get the max value, or use 0x7FFF.

for (int i = 0; i < SIZE / sizeof(short); ++i) {
    ZBUFFER[i] = SHRT_MAX;
}

Note this does not initialize the last couple bytes, if (SIZE % sizeof(short))

In C, you can do it like Adrian Panasiuk said, and you can also unroll the copy loop. Unrolling means copying larger chunks at a time. The extreme end of loop unrolling is copying the whole frame over with a zero frame, like this:

init()
{
    for (int i = 0; i < sizeof(ZBUFFER) / sizeof(ZBUFFER[0]; ++i) {
        empty_ZBUFFER[i] = SHRT_MAX;
    }
}

actual clearing:

memcpy(ZBUFFER, empty_ZBUFFER, SIZE);

(You can experiment with different sizes of the empty ZBUFFER, from four bytes and up, and then have a loop around the memcpy.)

As always, test your findings, if a) it's worth optimizing this part of the program and b) what difference the different initializing techniques makes. It will depend on a lot of factors. For the last few per cents of performance, you may have to resort to assembler code.

#include <algorithm>
#include <limits>

std::fill_n(ZBUFFER, size, std::numeric_limits<FOO>::max())

where FOO is the type of ZBUFFER's elements.

When you say "memset" do you actually have to use that function? That is only a byte-by-byte assign so it won't work with signed arrays.

If you want to set each value to the maximum you would use something like:

std::fill( ZBUFFER, ZBUFFER+len, std::numeric_limits<short>::max() )

when len is the number of elements (not the size in bytes of your array)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top