How to synchronize C & C++ libraries with minimal performance penalty?

https://stackoverflow.com/questions/286105

08-07-2019
|

Question

I have a C library with numerous math routines for dealing with vectors, matrices, quaternions and so on. It needs to remain in C because I often use it for embedded work and as a Lua extension. In addition, I have C++ class wrappers to allow for more convenient object management and operator overloading for math operations using the C API. The wrapper only consists of a header file and as much use on inlining is made as possible.

Is there an appreciable penalty for wrapping the C code versus porting and inlining the implementation directly into the C++ class? This library is used in time critical applications. So, does the boost from eliminating indirection compensate for the maintenance headache of two ports?

Example of C interface:

typedef float VECTOR3[3];

void v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

Example of C++ wrapper:

class Vector3
{
private:
    VECTOR3 v_;

public:
    // copy constructors, etc...

    Vector3& operator+=(const Vector3& rhs)
    {
        v3_add(&this->v_, this->v_, const_cast<VECTOR3> (rhs.v_));
        return *this;
    }

    Vector3 operator+(const Vector3& rhs) const
    {
        Vector3 tmp(*this);
        tmp += rhs;
        return tmp;
    }

    // more methods...
};

Solution

Your wrapper itself will be inlined, however, your method calls to the C library typically will not. (This would require link-time-optimizations which are technically possible, but to AFAIK rudimentary at best in todays tools)

Generally, a function call as such is not very expensive. The cycle cost has decreased considerably over the last years, and it can be predicted easily, so the the call penalty as such is negligible.

However, inlining opens the door to more optimizations: if you have v = a + b + c, your wrapper class forces the generation of stack variables, whereas for inlined calls, the majority of the data can be kept in the FPU stack. Also, inlined code allows simplifying instructions, considering constant values, and more.

So while the measure before you invest rule holds true, I would expect some room for improvements here.

A typical solution is to bring the C implementaiton into a format that it can be used either as inline functions or as "C" body:

// V3impl.inl
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs)
{
    // here you maintain the actual implementations
    // ...
}

// C header
#define V3DECL 
void V3DECL v3_add(VECTOR3 *out, VECTOR3 lhs, VECTOR3 rhs);

// C body
#include "V3impl.inl"


// CPP Header
#define V3DECL inline
namespace v3core {
  #include "V3impl.inl"
} // namespace

class Vector3D { ... }

This likely makes sense only for selected methods with comparedly simple bodies. I'd move the methods to a separate namespace for the C++ implementation, as you will usually not need them directly.

(Note that the inline is just a compiler hint, it doesn't force the method to be inlined. But that's good: if the code size of an inner loop exceeds the instruction cache, inlining easily hurts performance)

Whether the pass/return-by-reference can be resolved depends on the strength of your compiler, I've seen many where foo(X * out) forces stack variables, whereas X foo() does keep values in registers.

OTHER TIPS

If you're just wrapping the C library calls in C++ class functions (in other words, the C++ functions do nothing but call C functions), then the compiler will optimize these calls so that it's not a performance penalty.

As with any question about performance, you'll be told to measure to get your answer (and that's the strictly correct answer).

But as a rule of thumb, for simple inline methods that can actually be inlined, you'll see no performance penalty. In general, an inline method that does nothing but pass the call onto another function is a great candidate for inlining.

However, even if your wrapper methods were not inlined, I suspect you'd notice no performance penalty - not even a measurable one - unless the wrapper method was being called in some critical loop. Even then it would likely only be measurable if the wrapped function itself didn't do much work.

This type of thing is about the last thing to be concerned about. First worry about making your code correct, maintainable, and that you're using appropriate algorithms.

As usual with everything related to optimization, the answer is that you have to measure the performance itself before you know if the optimization is worthwhile.

Benchmark two different functions, one calling the C-style functions directly and another calling through the wrapper. See which one runs faster, or if the difference is within the margin of error of your measurement (which would mean there is no difference you can measure).
Look at the assembly code generated by the two functions in the previous step (on gcc, use -S or -save-temps). See if the compiler did something stupid, or if your wrappers have any performance bug.

Unless the performance difference is too big in favor of not using the wrapper, reimplementing is not a good idea, since you risk introducing bugs (which could even cause results which look sane but are wrong). Even if the difference is big, it would be simpler and less risky to just remember C++ is very compatible with C and use your library in the C style even within C++ code.

I don't think you'll notice much perf difference. Assuming your target platform support all your data types,

I'm coding for the DS and a few other ARM devices and floating points are evil...I had to typedef float to FixedPoint<16,8>

If you are worried that the overhead of calling functions is slowing you down, why not test inlining the C code or turning it into macros?

Also, why not improve the const correctness of the C code while you are at it - const_cast should really be used sparingly, especially on interfaces you control.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow