Return vs. Not Return of functions?

Question 1

As always when someone brings the argument that one thing is faster than the other, did you take timings? In fully optimized code, in every language and every compiler you plan to use? Without that, any argument based on performance is moot.

I’ll come back to the performance question in a second, just let me address what I think is more important first: There are good reasons to pass function parameters by reference, of course. The primary one I can think of right now is that the parameter is actually input and output, i.e., the function is supposed to operate on the existing data. To me, that is what a function signature taking a non-const reference indicates. If such a function then ignores what is already in that object (or, even worse, clearly expects to only ever get a default-constructed one), that interface is confusing.

Now, to come back to performance. I cannot speak for C# or Java (though I believe returning an object in Java would not cause a copy in the first place, just passing around a reference), and in C, you do not have references but might need to resort to passing pointers around (and then, I do agree that passing in a pointer to uninitialized memory is ok). But in C++, compilers have for a long time done return value optimization, RVO, which basically just means that in most calls like A a = f(b);, the copy constructor is bypassed and f will create the object directly in the right place. In C++11, we even got move semantics to make this explicit and use it in more places.

Should you just return an A* instead? Only if you really long for the old days of manual memory management. At the very least, return an std::shared_ptr<A> or an std::unique_ptr<A>.

Now, with multiple outputs, you get additional complications, of course. The first thing to do is if your design is actually proper: Each function should have a single responsibility, and usually, that means returning a single value as well. But there are of course exceptions to this; e.g., a partitioning function will have to return two or more containers. In that situation, you may find that the code is easier to read with non-const reference arguments; or, you may find that returning a tuple is the way to go.

I urge you to write your code both ways, and come back the next day or after a weekend and look at the two versions again. Then, decide what is easier to read. In the end, that is the primary criterion for good code. For those few places where you can see a performance difference from an end-user workflow, that is an additional factor to consider, but only in very rare cases should it ever take precedence over readable code – and with a little more effort, you can usually get both to work anyway.

Question 2

Due to Return Value Optimization, the second form (passing a reference and modifying it) is almost certainly slower and less amendable to optimization, as well as less legible.

Let us consider a simple example function:

return_value foo( void );

Here are the possibilities that may occur:

Return Value Optimization (RVO)
Named Return Value Optimization (NRVO)
Move semantic return
Copy semantic return

What is Return Value Optimization? Consider this function:

return_value foo( void ) { return return_value(); }

In this example, an unnamed temporary variable is returned from a single exit point. Because of this, the compiler can easily (and is free to) completely remove any traces of this temporary value, and instead construct it directly in place, in the calling function:

void call_foo( void )
{
    return_value tmp = foo();
}

In this example, tmp is actually directly used in foo as if foo defined it, removing all copies. This is a HUGE optimization if return_value is a non-trivial type.

When can RVO be used? That's up to the compiler, but in general, with a single return code point, it will always be used. Multiple return code points make it more iffy, but if they are all anonymous, your chances increase.

What about Named Return Value Optimization?

This one is a bit trickier; if you name the variable before you return it, it's now an l-value. This means the compiler has to do more work to prove that the in place construction will be possible:

return_type foo( void )
{
    return_type bar;
    // do stuff
    return bar;
}

In general, this optimization is still possible, but less likely with multiple code paths, unless each code path returns the same object; returning multiple different objects from multiple different code paths tends to not difficult to optimize out:

return_type foo( void)
{
    if(some_condition)
    {
        return_type bar = value;
        return bar;
    }
    else
    {
        return_type bar2 = val2;
        return bar2;
    }
}

This is not going to be as well received. It's still possible NRVO could kick in, but it's getting less and less likely. If at all possible, construct a single return_value and tweak it in different code paths, rather than returning wholly different ones.

If NRVO is possible, this will get rid of any overhead; it will be as if it was constructed directly in the calling function.

If neither form of return value optimization is possible, Move return may be possible.

C++11 and C++03 both have the possibility to do move semantics; rather than copying the information out of one object into another, move semantics allow one object to steal the data in another, setting it to some default state. For C++03 move semantics, you need boost.move, but the concept is still sound.

Move return isn't as fast as RVO return, but it's drastically faster than a copy. For a compliant C++11 compiler, of which there are many today, all STL and STD structures should support move semantics. Your own objects may not have a default move constructor/assignment operator (MSVC do not currently have default move semantic operations for user defined types), but adding move semantics is not hard: just use the copy-and-swap idiom to add it!

What is the copy-and-swap idiom?

Finally, if your return_value does not support move and your function is too hard to RVO, you will default to copy semantics, which is what your friend said to avoid.

However, in a large amount of cases, this will not be significantly slower!

For primitive types, such as float or int or bool, copying is a single assignment or move; hardly the sort of thing to complain about; passing such things by reference without a really good reason is sure to make your code slower, as references are internally pointers. For something like your bool example, there's no reason to waste time or energy passing a bool by reference; returning it is the fastest possible way.

When you return something that fits in a register, it's usually returned in a register for exactly that reason; it's fast, and as noted, easiest to maintain.

If your type is a POD type, such as a simple struct, this can often be passed through registers via a fastcall mechanism, or optimized away into direct assignments.

If your type is a large and imposing type, such as std::string or something with a lot of data behind it, requiring lots of deep copies, and your code is sufficiently complex as to make RVO unlikely, then perhaps passing by reference is a better idea.

Summary

Anonymous (rvalue) values of any kind should be returned by value
Small or primitive types should be returned by value.
Any type supporting move semantics (the STL, STD, etc) should be returned by value
Named (lvalue) values that are easy to reason about should be returned by value
Large data types in complex functions should be profiled or passed by reference

Always return by value when possible, if you are using C++11. It's more legible, and faster.

Question 3

There's no single answer to this question, but as you already stated, the central part is: It depends.

Clearly, for simple types, such as ints or bools, the return value is generally the preferred solution. It is easier to write and also less error-prone (i.e. because you cannot pass something undefined to the function and you don't need to separately define the variable before the call instruction). For complex types, such as a collection, the call-by-reference might be preferred because it avoids, as you say, the extra copy step. But you could also return a vector<int>* instead of just a vector<int>, which archives the same (for the cost of some extra memory-management, though). All this, however, also depends on the language used. The above will mostly hold true for C or C++, but for managed classes such as Java or C#, most complex types are reference-types anyway, so returning a vector does not involve any copying there.

Of course, there are situations where you do want the copy to happen, i.e. if you want to return the (copy of) an internal vector in such a way that the caller cannot modify the internal data structure of the called class.

So again: It depends.

Question 4

This is a distinction between methods and functions.

Methods (a.k.a. subroutine) are called primarily called for their side effect, which is to modify one or more of the objects passed into it as parameter. In languages that supports OOP, the object to be modified is usually implicitly passed as this/self parameter.

Functions, on the other hand, are called primarily for their return value, it calculates something new and shouldn't modify the parameters at all and should avoid side effects. Functions should be pure in the functional programming sense.

If a function/method is meant to create a new object (i.e. a factory) then the object should be returned. If you pass in a reference to variable, then it isn't clear who will be responsible for cleaning up the object previously contained in the variable, the caller or the factory? With factory function, it's clear that the caller is responsible for ensuring cleanup of the previous object; with factory method, it's not so clear because the factory can do cleanup, although that's often a bad idea for various reasons.

If a function/method is meant to modify an object or objects, then the object (s) should be passed in as argument, the object(s) that have been modified shouldn't be returned (an exception to this is if you're designing for fluent interface/method chaining in a language that supports them).

If your objects are immutable, then you should always use functions because every operations on immutable objects must create new object.

Adding two vectors should be a function (use return value), because the return value is a new vector. If you're adding another vector to an existing vector then that should be a method since you're modifying an existing vector and not allocating a new one.

In a language that doesn't support exception, return value is often used to signal error value; however on languages that supports exception, error conditions should always be signaled with exception, and there should never be a method that return a value, or a function that modified its arguments. In other words, don't do side effects and return a value within the same function/method.

Question 5

What should be returned by functions and what should not (or try to avoid)? It depends on what your method is supposed to do.

When your method modifies the list or returns new data you should use the return value. Its much better to understand what your code does than using a ref parameter.

Another benefit of return values is the ability to use method chaining.

You can write code like this which passes the list parameter from one method to another:

method1(list).method2(list)...

Question 6

As as been said, there is no general answer. But no one has talked about the machine level, so I'll do that and try some examples.

For operands that fit in a register, the answer is obvious. Every compiler I've seen will use a register for the return value (even if it's a struct). This is as efficient as you'll get.

So the remaining question is large operands.

At this point it's up to the compiler. It is true that some (especially older) compilers would emit a copy to implement return of a value larger than a register. But this is dark ages technology.

Modern compilers - primarily because RAM is much bigger these days, and that makes life much better - are not so stupid. When they see "return foo;" in a function body and foo does not fit in a register, they mark foo as a reference to memory. This is memory allocated by the caller to hold the return value. Consequently, the compiler ends up generating almost exactly the same code as it would if you had passed a reference to return value yourself.

Let's verify this. Here's a simple program.

struct Big {
  int a[10000];
};

Big process(int n, int c)
{
  Big big;
  for (int i = 0; i < 10000; i++)
    big.a[i] = n + i;
  return big;
}

void process(int n, int c, Big& big)
{
  for (int i = 0; i < 10000; i++)
    big.a[i] = n + i;
}

Now I'll compile it with the XCode compiler on my MacBook. Here's the relevant output for the return version:

    xorl    %eax, %eax
    .align  4, 0x90
LBB0_1:                                 ## =>This Inner Loop Header: Depth=1
    leal    (%rsi,%rax), %ecx
    movl    %ecx, (%rdi,%rax,4)
    incq    %rax
    cmpl    $10000, %eax            ## imm = 0x2710
    jne     LBB0_1
## BB#2:
    movq    %rdi, %rax
    popq    %rbp
    ret

and for the reference version:

    xorl    %eax, %eax
    .align  4, 0x90
LBB1_1:                                 ## =>This Inner Loop Header: Depth=1
    leal    (%rdi,%rax), %ecx
    movl    %ecx, (%rdx,%rax,4)
    incq    %rax
    cmpl    $10000, %eax            ## imm = 0x2710
    jne     LBB1_1
## BB#2:
    popq    %rbp
    ret

Even if you don't read assembly language code, you can see the similarity. There is perhaps one instruction's difference. This is with -O1. With optimization off, the code is longer, but still almost identical. With gcc version 4.2, the results are very similar.

So you should tell your friends "no". Using a return value with a modern compiler has no penalty.

Question 7

To me, the passing of a non-const pointer means two things:

The parameter may be changed in-place (you can pass a pointer to a struct member and obviate assignment);
The parameter needs not be returned if null is passed.

The latter may allow to avoid a whole possibly expensive branch of code that calculates its output value because it is not desired anyway.

I see this as an optimization, that is, something which is done when performance impact is measured or at least estimated. Otherwise I prefer as immutable data as possible, and as pure functions as possible, to simplify correct reasoning about the program's flow.

Usually correctness beats performance, so I'd stay with clear separation of (const) input parameters and a return struct, unless it's obviously or provably hampers performance or code readability.

(Disclaimer: I don't usually write in C.)