Question

My understanding is that C++ lets anything that isn't IO or external function calls can be optimized reorder. This is starting to frustrate my efforts at writing a RAII style function timestamp.

Edit

Here is a self contained example,

Code for VS 2012, with optimizations

#include <chrono>
#include <iostream>
#include <atomic>
#include <string>
using namespace std;
class TimeSlice
{
public:
    TimeSlice(std::string myname): name(myname),  start(timestamp())
    {
        fency();//don't optomize me out!
    }
    ~TimeSlice()
    {
        fency();
        auto elapsed = timestamp()-start;
        cout << name<<(int) elapsed << endl;
    }
    static inline long long timestamp()
    {
        return chrono::duration_cast<chrono::milliseconds>(chrono::system_clock::now().time_since_epoch()).count();
    }
private:
    const long long start;
    const std::string name;

    static inline void fency()
    {
        std::atomic_signal_fence(std::memory_order_seq_cst);
    }
};
Was it helpful?

Solution

The function calls are not optimized out, they are reordered. It is compiled as if:

    _sleep(10);
    start = timestamp();
    elapsed = timestamp()-start;
    cout << name<<(int) elapsed << endl;

This reordering is legal as neither _sleep nor timestamp constitute IO. You can prevent such reordering with a signal handler memory fence, if you have c++11 support. Just include and insert:

    std::atomic_signal_fence(std::memory_order_seq_cst);

at the beginning of the destructor and constructor bodies. With GCC 4.7 and 4.8 on x86_&4, such a memory fence will not produce any code, just constrain the compilers reordering.

OTHER TIPS

I don't think the C++ optimizer is actually the problem here. More likely, the problem is that the amount of time you are attempting to measure is in some cases less than the minimum granularity of the clock you are using -- that is, timestamp() is returning the same value in both your TimeSlice constructor and in the destructor, because the second call was executed so quickly after the first one that the clock value had not enough time to increment by its next tick-amount.

If that's the case, then the solution would be either to find a higher-precision clock API to use instead, or measure longer events (e.g. do a loop of 10000 iterations of the operation and measure that), or just live with the understanding that very small time increments may get effectively rounded down to zero by the clock's tick-granularity.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top