Question

Here is my problem. I have a base class and a derived class which overrides some methods from the base class. For simplicity consider the following example:

struct base
{
  virtual void fn()
  {/*base definition here*/}
};

struct derived : base 
{
  void fn()
  {/*derived definition here*/}
};

In my actual program, these classes are passed as arguments to other classes and are called in other methods, but for the sake of simplicity let's create a simple function that takes as argument either the base or derived class. I can simply write

void call_fn(base& obj)
{obj.fn();}

and the call to the appropriate function will be resolved at run-time due to the virtual functions.

I am worried, however, that if the call_fn, is to be called million times (which in my case it will as my actual application is a simulation experiment), I will get a significant overhead which I would like to avoid.

So, I was wondering if using a static_cast could actually tackle the problem. Maybe something like this:

template <typename T>
void call_fn(base& obj)
{(static_cast<T*>(&obj))->fn();}

In this case, the function call would be done as call_fn<base>(obj) for calling the base method or call_fn<derived>(obj) for calling the derived method.

Will this solution avoid the vtable overhead or will it still be affected? Thanks in advance for any replies!

By the way, I am aware of the CRTP but not very familiar with it. That is why I would like to know the answer to this simple question first :)

Was it helpful?

Solution

Will this solution avoid the vtable overhead or will it still be affected?

It will still use dynamic dispatch (whether that causes any noticeable overhead is a completely different question). You can disable dynamic dispatch by qualifying the function call as in:

static_cast<T&>(obj).T::fn();

Although I would not even try to do so. Leave dynamic dispatch, then test the performance of the application, do some profiling, do further profiling. Profile again to make sure that you understand what the profiler is telling you. Only then, consider making a single change and profile again to verify whether your assumptions are correct or not.

OTHER TIPS

This isn't really an answer to your actual question, but I was curious as to "what really is the overhead of calling a virtual function vs calling a regular class function". To make it "fair", I created a classes.cpp that implements a very simple function, but it's a in separate file that is compiled outside of the "main".

classes.h:

#ifndef CLASSES_H
#define CLASSES_H

class base
{
    virtual int vfunc(int x) = 0;
};

class vclass : public base
{
public:
    int vfunc(int x);
};


class nvclass
{
public:
    int nvfunc(int x);
};


nvclass *nvfactory();
vclass* vfactory();


#endif

classes.cpp:

#include "classes.h"

int vclass:: vfunc(int x)
{
    return x+1;
}


int nvclass::nvfunc(int x)
{
    return x+1;
}

nvclass *nvfactory()
{
    return new nvclass;
}

vclass* vfactory()
{
    return new vclass;
}

This is called from:

#include <cstdio>
#include <cstdlib>
#include "classes.h"

#if 0
#define ASSERT(x) do { if(!(x)) { assert_fail( __FILE__, __LINE__, #x); } } while(0)
static void assert_fail(const char* file, int line, const char *cond)
{
    fprintf(stderr, "ASSERT failed at %s:%d condition: %s \n",  file, line, cond); 
    exit(1);
}
#else
#define ASSERT(x) (void)(x)
#endif

#define SIZE 10000000

static __inline__ unsigned long long rdtsc(void)
{
    unsigned hi, lo;
    __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
    return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
}


void print_avg(const char *str, const int *diff, int size)
{
    int i;
    long sum = 0;
    for(i = 0; i < size; i++)
    {
    int t = diff[i];
    sum += t;
    }

    printf("%s average =%f clocks\n", str, (double)sum / size);
}


int diff[SIZE]; 

int main()
{
    unsigned long long a, b;
    int i;
    int sum = 0;
    int x;

    vclass *v = vfactory();
    nvclass *nv = nvfactory();


    for(i = 0; i < SIZE; i++)
    {
    a = rdtsc();

    x = 16;
    sum+=x;
    b = rdtsc();

    diff[i] = (int)(b - a);
    }

    print_avg("Emtpy", diff, SIZE);


    for(i = 0; i < SIZE; i++)
    {
    a = rdtsc();

    x = 0;
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    x = v->vfunc(x);
    ASSERT(x == 4); 
    sum+=x;
    b = rdtsc();

    diff[i] = (int)(b - a);
    }

    print_avg("Virtual", diff, SIZE);

    for(i = 0; i < SIZE; i++)
    {
    a = rdtsc();
    x = 0;
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    x = nv->nvfunc(x);
    ASSERT(x == 4);     
    sum+=x;
    b = rdtsc();
    diff[i] = (int)(b - a);
    }
    print_avg("no virtual", diff, SIZE);

    printf("sum=%d\n", sum);

    delete v;
    delete nv;

    return 0;
}

The REAL difference in code is this: virtual call:

40066b: ff 10                   callq  *(%rax)

non virtual call:

4006d3: e8 78 01 00 00          callq  400850 <_ZN7nvclass6nvfuncEi>

And the results:

Emtpy average =78.686081 clocks
Virtual average =144.732567 clocks
no virtual average =122.781466 clocks
sum=480000000

Remember that that's the overhead for 16 calls per loop, so the difference between calling a function and not calling a function is around 5 clock cycles per iteration [including adding up the results and other processing required], and the virtual call adds 22 clocks per iteration, so around 1.5 clocks per call.

I doubt you will notice, assuming you do something a bit more meaningful than return x + 1 in your function.

The VTable resides in your class. If you have virtual members, they will be accessed through the VTable. The cast will not affect whether or not the VTable exists, nor how members are accessed.

If you have a polymorphic array, where the elements are polymorphic but all elements have the same type, you can also externalize the vtable. This allows you to look up the function once and then call it directly on each element. In that case, C++ doesn't help you though, you will have to do it manually.

This is also useful if you are microoptimizing things. I believe that Boost's function uses a similar technique. It only needs two functions (call and release reference) in the vtable, but the compiler-generated one would also contain RTTI and some other stuff, which can be avoided by hand-coding a vtable that only has those two function pointers.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top