Why should one rely on Named Return Value Optimization?

Question 1

Dealing with return values is simply easier than dealing with methods that return by writing to a reference parameter. Consider the following 2 methods

C GetByRet() { ... }
void GetByParam(C& returnValue) { ... }

First problem is that it makes it impossible to chain method calls

Method(GetByRet());  
// vs. 
C temp;
GetByParam(temp);
Method(temp);

It also makes features like auto impossible to use. Not so much of a problem for a type like C but more important for types like std::map<std::string, std::list<std::string>*>

auto ret = GetByRet();
// vs.
auto value; // Error! 
GetByParam(value);

Also as GMacNickG pointed out, what if the type C has a private constructor that normal code can't use? Maybe the constructor is private or there just isn't a default constructor. Once again GetByRet works like a champ and GetByParam fails

C ret = GetByRet();  // Score! 
// vs.
C temp; // Error! Can't access the constructor 
GetByParam(temp);

Question 2

This is not an answer, but it is also an answer in some sense...

Given a function that takes an argument by pointer, there is a trivial transformation that will yield a function that returns by value and is trivially optimizable by the compiler.

void f(T *ptr) {     
   // uses ptr->...
}

Add a reference to the object in the function and replace all uses of ptr with the reference

void f(T *ptr) { T & obj = *ptr; /* uses obj. instead of ptr-> */ }
Now remove the argument, add the return type, replace T& obj with T obj and change all returns to yield 'obj'

T f() { T obj; // No longer a ref! /* code does not change */ return obj; }
At this point you have a function that returns by value for which NRVO is trivial, since all of the return statements refer to the same object.

This transformed function has some of the same shortcomings that the pass by pointer has, but it is never worse that it. But it demonstrates that whenever pass by pointer is an option, return by value is also an option with the same cost.

Exactly the same cost?

This is beyond the language, but when the compiler generates code it does so following an ABI (Application Binary Interface) that allows code build by different runs of the compiler (or even different compilers in the same platform) to interact. All currently used ABIs share a common trait for return by value functions: For large (does not fit in registers) return types, memory for the returned object is allocated by the caller, and the function takes an extra pointer with the location of that memory. That is when the compiler sees

T f();

The calling convention transforms that into:

void mangled_name_for_f( T* __result )

So if you compare the alternatives: T t; f(&t); and T t = f(); in both cases the generated code allocates the space in the caller's frame, [1], calls a function passing a pointer. At the end of the function the compiler will [2] return. Where [#] is the location where the object's constructor is actually called in each one of the alternatives. The costs of both alternatives are the same, with the difference that in [1] the object must be default constructed, while in [2] you might already know the final values of the object and you might be able to do something more efficient.

Regarding performance, is that all there is?

Not really. If you later need to pass that object to a function that takes the argument by value say void g(T value), in the case of pass-by-pointer, there is a named object in the caller's stack, so the object must be copied (or moved) to the location where the calling convention requires the value argument to be. In the case of return by value, the compiler knowing that it will call g(f()) knows that the only use of the returned object from f() is being the argument of g(), so it can just pass a pointer to the appropriate location when calling f(), which means that there won't be any copies done. At this point, the manual approach starts falling really behind the compiler's approach even if the implementation of f uses the dumb transformation above!

T obj;    // default initialize
f(&obj);  // assign (or modify in place)
g(obj);   // copy

g(f());   // single object is returned and passed to g(), no copies

Question 3

It is NOT in fact possible (or desirable) to always return a value by reference (think about operator+ as a basic counter-example).

To answer your question: You typically don't rely or expect NRVO to always occur, but you do expect the compiler to do a reasonable job of optimizing. Only if/when profiling indicates that copying a return value is expensive do you need to worry about helping the compiler out with hints or an alternate interface.

EDIT for some function could be optimized just by using return parameter:

First, remember that if the function isn't called often, or the compiler has sufficient smarts you can't guarantee that return-by-out-parameter is an optimization. Second, remember that you will have future maintainers of the code, and that writing clear, grokkable code is one of the biggest helps you can provide (it doesn't matter how fast broken code is). Third, take a moment and read http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ and see if it may change your mind.

Question 4

Many argue that passing in non-const reference parameters to functions and then changing those parameters in the function is not very intuitive.

Also, there are many pre-defined operators that return their results by value (e.g., the arithmetic operators such as operator+, operator-, etc...). Since you want to keep the default semantics (and signature) of such operators, you are forced to rely on NRVO to optimize out the temporary object that is returned by value.

Finally, returning by value allows for easier chaining in many cases than passing in parameters to be changed by non-const reference (or pointer).