Return value copying issue (to improve debug timing) — What's the solution here?

https://stackoverflow.com/questions/7519486

26-01-2021
|

Question

The most interesting C++ question I've encountered recently goes as follows:

We determined (through profiling) that our algorithm spends a lot of time in debug mode in MS Visual Studio 2005 with functions of the following type:

MyClass f(void)
{
      MyClass retval;
      // some computation to populate retval

      return retval;
}

As most of you probably know, the return here calls a copy constructor to pass out a copy of retval and then the destructor on retval. (Note: the reason release mode is very fast for this is because of the return value optimization. However, we want to turn this off when we debug so that we can step in and nicely see things in the debugger IDE.)

So, one of our guys came up with a cool (if slightly flawed) solution to this, which is, create a conversion operator:

MyClass::MyClass(MyClass *t)
{
      // construct "*this" by transferring the contents of *t to *this   
      // the code goes something like this
      this->m_dataPtr = t->m_dataPtr;  

      // then clear the pointer in *t so that its destruction still works
      // but becomes 'trivial'
      t->m_dataPtr = 0;
}

and also changing the function above to:

MyClass f(void)
{
      MyClass retval;
      // some computation to populate retval

      // note the ampersand here which calls the conversion operator just defined
      return &retval;      
}

Now, before you cringe (which I am doing as I write this), let me explain the rationale. The idea is to create a conversion operator that basically does a "transfer of contents" to the newly constructed variable. The savings happens because we're no longer doing a deep copy, but simply transferring the memory by its pointer. The code goes from a 10 minute debug time to a 30 second debug time, which, as you can imagine, has a huge positive impact on productivity. Granted, the return value optimization does a better job in release mode, but at the cost of not being able to step in and watch our variables.

Of course, most of you will say "but this is abuse of a conversion operator, you shouldn't be doing this kind of stuff" and I completely agree. Here's an example why you shouldn't be doing it too (this actually happened:)

  void BigFunction(void)
  {
        MyClass *SomeInstance = new MyClass;

        // populate SomeInstance somehow

        g(SomeInstance);

        // some code that uses SomeInstance later
        ...
  }

where g is defined as:

  void g(MyClass &m)
  {
      // irrelevant what happens here.
  }

Now this happened accidentally, i.e., the person who called g() should not have passed in a pointer when a reference was expected. However, there was no compiler warning (of course). The compiler knew exactly how to convert, and it did so. The problem is that the call to g() will (because we've passed it a MyClass * when it was expecting a MyClass &) called the conversion operator, which is bad, because it set the internal pointer in SomeInstance to 0, and rendered SomeInstance useless for the code that occured after the call to g(). ... and time consuming debugging ensued.

So, my question is, how do we gain this speedup in debug mode (which has as direct debugging time benefit) with clean code that doesn't open the possibility to make such other terrible errors slip through the cracks?

I'm also going to sweeten the pot on this one and offer my first bounty on this one once it becomes eligible. (50 pts)

Solution

If your definition of g is written the same as in your code base I'm not sure how it compiled since the compiler isn't allowed to bind unnamed temporaries to non-const references. This may be a bug in VS2005.

If you make the converting constructor explicit then you can use it in your function(s) (you would have to say return MyClass(&retval);) but it won't be allowed to be called in your example unless the conversion was explicitly called out.

Alternately move to a C++11 compiler and use full move semantics.

(Do note that the actual optimization used is Named Return Value Optimization or NRVO).

OTHER TIPS

You need to use something called "swaptimization".

MyClass f(void)
{
      MyClass retval;
      // some computation to populate retval

      return retval;
}
int main() {
    MyClass ret;
    f().swap(ret);
}

This will prevent a copy and keep the code clean in all modes.

You can also try the same trick as auto_ptr, but that's more than a little iffy.

The problem is occuring because you're using MyClass* as a magic device, sometimes but not always. Solution: use a different magic device.

class MyClass;
class TempClass { //all private except destructor, no accidental copies by callees
    friend MyClass;

    stuff* m_dataPtr; //unfortunately requires duplicate data
                      //can't really be tricked due to circular dependancies.

    TempClass() : m_dataPtr(NULL) {}
    TempClass(stuff* p) : m_dataPtr(p) {}
    TempClass(const TempClass& p) : m_dataPtr(p) {}
public:
    ~TempClass() {delete m_dataPtr;}
};

class MyClass {
    stuff* m_dataPtr;

    MyClass(const MyClass& b) {
        m_dataPtr = new stuff();
    }
    MyClass(TempClass& b) {
        m_dataPtr = b.m_dataPtr ;
        b.m_dataPtr = NULL;
    }
    ~MyClass() {delete m_dataPtr;}
    //be sure to overload operator= too.
    TempClass f(void) //note: returns hack.  But it's safe
    {
        MyClass retval;
        // some computation to populate retval   
        return retval;
    }
    operator TempClass() {
        TempClass r(m_dataPtr);
        m_dataPtr = nullptr;
        return r;
    }

Since TempClass is almost all private (friending MyClass), other objects cannot create, or copy TempClass. This means the hack can only be created by your special functions when clearly told to, preventing accidental usage. Also, since this doesn't use pointers, memory can't be accidentally leaked.

Move semantics have been mentioned, you've agreed to look them up for education, so that's good. Here's a trick they use.

There's a function template std::move which turns an lvalue into an rvalue reference, that is to say it gives "permission" to move from an object[*]. I believe you can imitate this for your class, although I won't make it a free function:

struct MyClass;
struct MovableMyClass {
    MyClass *ptr;
    MovableMyClass(MyClass *ptr) : ptr(ptr) {}
};

struct MyClass {
    MyClass(const MovableMyClass &tc) {
        // unfortunate, we need const reference to bind to temporary
        MovableMyClass &t = const_cast<MovableMyClass &>(tc);
        this->m_dataPtr = t.ptr->m_dataPtr;
        t.ptr->m_dataPtr = 0;
    }
    MovableMyClass move() {
        return MovableMyClass(this);
    }
};

MyClass f(void)
{
    MyClass retval;
    return retval.move();
}

I haven't tested this, but something along those lines. Note the possibility of doing something const-unsafe with a MovableMyClass object that actually is const, but it should be easier to avoid ever creating one of those than it is to avoid creating a MyClass* (which you've found out is quite difficult!)

[*] Actually I'm pretty sure I've over-simplified that to the point of being wrong, it's actually about affecting what overload gets chosen rather than "turning" anything into anything else as such. But causing a move instead of a copy is what std::move is for.

A different approach, given your special scenario:

Change MyClass f(void) (or operator+) to something like the following:

MyClass f(void)
{
   MyClass c;
   inner_f(c);
   return c;
}

And let inner_f(c) hold the actual logic:

#ifdef TESTING
#   pragma optimize("", off)
#endif

inline void inner_f(MyClass& c)
{
   // actual logic here, setting c to whatever needed
}

#ifdef TESTING
#   pragma optimize("", on)
#endif

Then, create an additional build configurations for this kind of testing, in which TESTING is included in the preprocessor definitions.

This way, you can still take advantage of RVO in f(), but the actual logic will not be optimized on your testing build. Note that the testing build can either be a release build or a debug build with optimizations turned on. Either way, the sensitive parts of the code will not be optimized (you can use the #pragma optimize in other places too, of course - in the code above it only affects inner_f itself, and not code called from it).

Possible solutions

Set higher optimization options for the compiler so it optimizes out the copy construction
Use heap allocation and return pointers or pointer wrappers, preferably with garbage collection
Use the move semantics introduced in C++11; rvalue references, std::move, move constructors
Use some swap trickery, either in the copy constructor or the way DeadMG did, but I don't recommend them with a good conscience. An inappropriate copy constructor like that could cause problems, and the latter is a bit ugly and needs easily destructible default objects which might not be true for all cases.
+1: Check and optimize your copy constructors, if they take so long then something isn't right about them.

I would prefer to simply pass the object by reference to the calling function when MyClass is too big to copy:

void f(MyClass &retval)  // <--- no worries !
{
  // some computation to populate retval
}

Just simple KISS principle.

Okay I think I have a solution to bypass the Return Value Optimization in release mode, but it depends on the compiler and not guaranteed to work. It is based on this.

MyClass f (void)
{
    MyClass retval;
    MyClass dummy;

    //  ...

    volatile bool b = true;
    if b ? retval : dummy;
}

As for why the copy construction takes so long in DEBUG mode, I have no idea. The only possible way to speed it up while remaining in DEBUG mode is to use rvalue references and move semantics. You already discovered move semantics with your "move" constructor that accepts pointer. C++11 gives a proper syntax for this kind of move semantics. Example:

//  Suppose MyClass has a pointer to something that would be expensive to clone.
//  With move construction we simply move this pointer to the new object.

MyClass (MyClass&& obj) :
    ptr (obj.ptr)
{
    //  We set the source object to some trivial state so it is easy to delete.
    obj.ptr = NULL;
}

MyClass& operator = (MyClass&& obj) :
{
    //  Here we simply swap the pointer so the old object will be destroyed instead of the temporary.
    std::swap(ptr, obj.ptr);
    return *this;
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow