This is not an answer, but it is also an answer in some sense...
Given a function that takes an argument by pointer, there is a trivial transformation that will yield a function that returns by value and is trivially optimizable by the compiler.
void f(T *ptr) {
// uses ptr->...
}
Add a reference to the object in the function and replace all uses of ptr with the reference
void f(T *ptr) { T & obj = *ptr;
/* uses obj. instead of ptr-> */
}
Now remove the argument, add the return type, replace T& obj
with T obj
and change all returns to yield 'obj'
T f() {
T obj; // No longer a ref!
/* code does not change */
return obj;
}
At this point you have a function that returns by value for which NRVO is trivial, since all of the return statements refer to the same object.
This transformed function has some of the same shortcomings that the pass by pointer has, but it is never worse that it. But it demonstrates that whenever pass by pointer is an option, return by value is also an option with the same cost.
Exactly the same cost?
This is beyond the language, but when the compiler generates code it does so following an ABI (Application Binary Interface) that allows code build by different runs of the compiler (or even different compilers in the same platform) to interact. All currently used ABIs share a common trait for return by value functions: For large (does not fit in registers) return types, memory for the returned object is allocated by the caller, and the function takes an extra pointer with the location of that memory. That is when the compiler sees
T f();
The calling convention transforms that into:
void mangled_name_for_f( T* __result )
So if you compare the alternatives: T t; f(&t);
and T t = f();
in both cases the generated code allocates the space in the caller's frame, [1], calls a function passing a pointer. At the end of the function the compiler will [2] return. Where [#] is the location where the object's constructor is actually called in each one of the alternatives. The costs of both alternatives are the same, with the difference that in [1] the object must be default constructed, while in [2] you might already know the final values of the object and you might be able to do something more efficient.
Regarding performance, is that all there is?
Not really. If you later need to pass that object to a function that takes the argument by value say void g(T value)
, in the case of pass-by-pointer, there is a named object in the caller's stack, so the object must be copied (or moved) to the location where the calling convention requires the value argument to be. In the case of return by value, the compiler knowing that it will call g(f())
knows that the only use of the returned object from f()
is being the argument of g()
, so it can just pass a pointer to the appropriate location when calling f()
, which means that there won't be any copies done. At this point, the manual approach starts falling really behind the compiler's approach even if the implementation of f
uses the dumb transformation above!
T obj; // default initialize
f(&obj); // assign (or modify in place)
g(obj); // copy
g(f()); // single object is returned and passed to g(), no copies