C++: returning by reference and copy constructors

https://stackoverflow.com/questions/2273811

20-09-2019
|

Question

References in C++ are baffling me. :)

The basic idea is that I'm trying to return an object from a function. I'd like to do it without returning a pointer (because then I'd have to manually delete it), and without calling the copy-constructor, if possible (for efficiency, naturally added: and also because I wonder if I can't avoid writing a copy constructor).

So, all in all, here are the options for doing this that I have found:

The function return type can be either the class itself (MyClass fun() { ... }) or a reference to the class (MyClass& fun() { ... }).
The function can either construct the variable at the line of return (return MyClass(a,b,c);) or return an existing variable (MyClass x(a,b,c); return x;).
The code that receives the variable can also have a variable of either type: (MyClass x = fun(); or MyClass& x = fun();)
The code which receives the variable can either create a new variable on the fly (MyClass x = fun();) or assign it to an existing variable (MyClass x; x = fun();)

And some thoughts on that:

It seems to be a bad idea to have the return type MyClass& because that always results in the variable being destroyed before it gets returned.
The copy constructor only seems to get involved when I return an existing variable. When returning a variable constructed in the line of return, it never gets called.
When I assign the result to an existing variable, the destructor also always kicks in before the value is returned. Also, no copy constructor gets called, yet target variable does receive the member values of the object returned from the function.

These results are so inconsistent that I feel totally confused. So, what EXACTLY is happening here? How should I properly construct and return an object from a function?

Solution

The best way to understand copying in C++ is often NOT to try to produce an artificial example and instrument it - the compiler is allowed to both remove and add copy constructor calls, more or less as it sees fit.

Bottom line - if you need to return a value, return a value and don't worry about any "expense".

OTHER TIPS

Recommended reading: Effective C++ by Scott Meyers. You find a very good explanation about this topic (and a lot more) in there.

In brief, if you return by value, the copy constructor and the destructor will be involved by default (unless the compiler optimizes them away - that's what happens in some of your cases).

If you return by reference (or pointer) a variable which is local (constructed on the stack), you invite trouble because the object is destructed upon return, so you have a dangling reference as a result.

The canonical way to construct an object in a function and return it is by value, like:

MyClass fun() {
    return MyClass(a, b, c);
}

MyClass x = fun();

If you use this, you don't need to worry about ownership issues, dangling references etc. And the compiler will most likely optimize out the extra copy constructor / destructor calls for you, so you don't need to worry about performance either.

It is possible to return by reference an object constructed by new (i.e. on the heap) - this object will not be destroyed upon returning from the function. However, you have to destroy it explicitly somewhere later by calling delete.

It is also technically possible to store an object returned by value in a reference, like:

MyClass& x = fun();

However, AFAIK there is not much point in doing this. Especially because one can easily pass on this reference to other parts of the program which are outside of the current scope; however, the object referenced by x is a local object which will be destroyed as soon as you leave the current scope. So this style can lead to nasty bugs.

read about RVO and NRVO (in a word these two stands for Return Value Optimization and Named RVO, and are optimization techniques used by the compiler to do what you're trying to achieve)

you'll find a lot of subjects here on stackoverflow

If you create an object like this:

MyClass foo(a, b, c);

then it will be on the stack in the function's frame. When that function ends, its frame is popped off the stack and all the objects in that frame are destructed. There is no way to avoid this.

So if you want to return an object to a caller, you only options are:

Return by value - a copy constructor is required (but the call to the copy constructor may be optimised out).
Return a pointer and make sure you either use smart pointers to deal with it or carefully delete it yourself when done with it.

Attempting to construct a local object and then return a reference to that local memory to a calling context is not coherent - a calling scope can not access memory that is local to the called scope. That local memory is only valid for the duration of the function that owns it - or, another way, while execution remains in that scope. You must understand this to program in C++.

About the only time it makes sense to return a reference is if you're returning a reference to a pre-existing object. For an obvious example, nearly every iostream member function returns a reference to the iostream. The iostream itself exists before any of the member functions is called, and continues to exist after they're called.

The standard allows "copy elision", which means the copy constructor doesn't need to be called when you return an object. This comes in two forms: Name Return Value Optimization (NRVO) and anonymous Return Value Optimization (usually just RVO).

From what you're saying, your compiler implements RVO but not NRVO -- which means it's probably a somewhat older compiler. Most current compilers implement both. The un-matched dtor in this case means it's probably something like gcc 3.4 or thereabouts -- though I don't remember the version for sure, there was a one around then that had a bug like this. Of course, it's also possible that your instrumentation isn't quite right, so a ctor that you didn't instrument is being used, and a matching dtor is being invoked for that object.

In the end, you're stuck with one simple fact though: if you need to return an object, you need to return an object. In particular, a reference can only give access to a (possibly modified version of) an existing object -- but that object had to be constructed at some point as well. If you can modify some existing object without causing a problem, that's fine and well, go ahead and do it. If you need a new object different and separate from those you already have, go ahead and do that -- pre-creating the object and passing in a reference to it may make the return itself faster, but won't save any time overall. Creating the object has about the same cost whether done inside or outside the function. Any reasonably modern compiler will include RVO, so you won't pay any extra cost for creating it in the function, then returning it -- the compiler will just automate allocating space for the object where it's going to be returned, and have the function construct it "in place", where it'll still be accessible after the function returns.

Basically, returning a reference only makes sense if the object still exists after leaving the method. The compiler will warn you if you return a reference to something that is being destroyed.

Returning a reference rather than an object by value saves copying the object which might be significant.

References are safer than pointers because they have different symantics, but behind the scenes they are pointers.

One potential solution, depending on your use case, is to default-construct the object outside of the function, take in a reference to it, and initialize the referenced object within the function, like so:

void initFoo(Foo& foo) 
{
  foo.setN(3);
  foo.setBar("bar");
  // ... etc ...
}

int main() 
{
  Foo foo;
  initFoo(foo);

  return 0;
}

Now this of course does not work if it is not possible (or does not make sense) to default-construct a Foo object and then initialize it later. If that is the case, then your only real option to avoid copy-construction is to return a pointer to a heap-allocated object.

But then think about why you are trying to avoid copy-construction in the first place. Is the "expense" of copy construction really affecting your program, or is this a case of premature optimization?

You are stucked with either:

1) returning a pointer

MyClass* func(){ //some stuf return new MyClass(a,b,c); }

2) returning a copy of the object MyClass func(){ return MyClass(a,b,c); }

Returning a reference is not valid because the object is to be destroyed after exiting the func scope, except if the function is a member of the class and the reference is from a variable that is member of the class.

Not a direct answer, but a viable suggestion: You could also return a pointer, wrapped in an auto_ptr or smart_ptr. Then you'll be in control of what constructors and destructors get called and when.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow