Question

I was reading this pull request on dablooms

To top it off, Murmur doesn't return the hash value on the stack/registers but writes it directly to a provided buffer. This makes it exceedingly easy to fill the bloom->hashes buffer with a lot of random data and perform the modularization incrementally.

for (i = 0; i < bloom->nsalts; i++, hashes += 4) {
    MurmurHash3_x64_128(key, key_len, bloom->salts[i], hashes);
    hashes[0] = hashes[0] % bloom->counts_per_func;
    hashes[1] = hashes[1] % bloom->counts_per_func;
    hashes[2] = hashes[2] % bloom->counts_per_func;
    hashes[3] = hashes[3] % bloom->counts_per_func;
}

I recently noticed that some libraries (at least in C++, my knowledge is pretty limited as I'm quite a newbie) seem not to return values but to instead expect an output argument on which they will write the result. I'm more used to see thing like this:

Matrix leftOperand = { /* some values */ };
Matrix rightOperand = { /* some values */ };
Matrix result;
result = multiplyMatrices( leftOperand, rightOperand );

Where result is the return value of multiplyMatrices. But learning to use OpenGL, GLEW and freeglut in my most recent project, I see more often than not calls like this:

Matrix leftOperand = { /* some values */ };
Matrix rightOperand = { /* some values */ };
Matrix result;
multiplyMatrices( leftOperand, rightOperand, result );

I understand what it does, but I find the notation to be odd. However I see that more and more often, and when I saw it 'praised' by the author of above's pull request, I thought there might be a good reason to do that.

So in my quest to write less bad and smelly code, I'd like to know if this is good practice. I assume it must be for performance reasons, but I don't see clearly if pushing/poping from the stack is slower than writing directly to a given memory address.

I'm looking for some guidelines on the reasons to use one way over another.

Was it helpful?

Solution

Let's take a simple example in C:

struct X
{
    /* a lot of data */
};

struct X func1(/*args*/)
{
    struct X result;
    /* fill in result */
    return result;
}

int func2(/*args*/, struct X *result)
{
    if (result == NULL)
        return -1;
    /* fill in *result */
    return 0;
}

/* in the code */
struct X x;
x = func1(/* args */);
func2(/* args */, &x);

func1, fills in necessary data on its own stack frame, then the line x = func1(...) copies it over to x. func2 gets the pointer to x as its argument and fills it in.

As you can see here, there are two disadvantages using func1:

  1. You can't report error, unless setting x to an invalid state (which may not exist for all structs, e.g. a matrix)
  2. You have to copy possibly large amount of data.

In some cases, error may not be possible at all, so point 1 may not necessarily hold. Point 2 however is a performance killer if your struct is large.

Now imagine the same thing in C++, except struct X is now class X, with a copy constructor and a bunch of std::string and std::list etc inside it. The performance gain of using func2 becomes even more because copying an object of class X now requires calling multiple copy constructors, deep copying everything, and then destructing the local object inside func1. (With C++11 this would be less bad (as bad as in C) because the object can be moved).

The only reason one would want to use func1 is for readability and ease of writing. For example, compare this:

string fix(const string &s);
void print(const string &s);

string str;
print(fix(str));

vs this:

void fix(const string &s, string &res);
void print(const string &s);

string str;
string fixed;
fix(str, fixed);
print(fixed);

The first one is clearly more understandable, while the second one is more efficient. Note that in a language like C, the equivalent of the first code may cause a memory leak, so it's not even possible.

So the question comes down to which of these two are you more concerned with. If you talk with a Java guy, they probably tell you "the performance difference is so small you won't even notice it". If you talk with a C guy they will probably tell you "the performance difference may be small on some architectures, but that doesn't mean you can go around doing unnecessary stuff".

In the post you mentioned, the programmer is trying to improve performance of the library. The functionality he is improving is "filling in a buffer". This cries for passing the buffer as argument:

  • You don't need to copy the contents of the buffer
  • You can provide your own buffer, whether it is statically or dynamically allocated.
    • The function doesn't need to allocate memory each time
    • You can reuse the buffer without having to reallocate memory
    • If statically allocated, you don't need to free the buffer
  • You can partially fill in a larger buffer
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top