Question

Typically, I'm interested in knowing if the the standard template library incurs performance/speed overheads in codes for numerical/scientific computing.

For eg. Is declaring an array as

double 2dmatrix [10][10]

going to give me more performance than

std::vector<std::vector<double> > 2dmatrix(10,std::vector<double>(10,0.0))

?

I would also appreciate some general ideas, as to whether C has better performance than C++ for scientific computing. I have written my codes in a very Object oriented style using STL , and using C++11 a lot. I am beginning to consider if I should start looking into pure C, if its going to run faster.

Any thoughts on this are welcome.

Was it helpful?

Solution

Given the abstraction it provides, the C++ std::vector is as efficient as it gets: 3 pointers on the stack, and dynamically allocated data that on average does 1 reallocation per element on a linear growth scenario (because the resizing expands the capacity more than proportionally, a factor of 1.5 to 2).

The C equivalent using malloc() and realloc() would be at least as expensive, and more cumbersome (manual resizing etc.). Moreover, the std::vector allows user-defined performance tuning through special allocators (pool-based, stack-allocated, etc.), which in C++11 are not as hard to use as it was in C++98.

If you don't need dynamic resizing, you can code both in C and C++ a static array (or std::array in C++).

In general, for high-performance computing, C++ has more potential for optimization, in particular through the use of function objects that can be inlined (in contrast to regular C function pointers). The canonical example is sorting

int comp( const void* a, const void* b ) {
    return /* your comparison here */;
}

// C style sorting
qsort( arr, LARGE_SIZE, sizeof( int ), comp ); 
                                       ^^^^ <---- no-inlining through function pointer

// C++11 style sorting (use hand-made function object for C++98
std::sort(std::begin(arr), std::end(arr), [](auto a, auto b) { 
    return comp(&a, &b);
           ^^^^ <----- C++11 lambdas can be fully inlined
});

OTHER TIPS

The overhead of std::vector is:

  • 3 pointers on the stack
  • dynamic allocation (lazily, i.e. it doesn't allocate anything until required)

A stack-allocated array might be faster in some cases (for small amounts of data). For this you can use std::array<T, Length>.

If you need a 2-dimensional grid I would allocate the data in a single vector: std::vector<T>(width * height);. Then you can write a number of helper functions to obtain elements by x and y coordinates. (Or you can write a wrapper class.)

If you have no reason to resize an array, and know it's size during compilation (as you do in your first example), the better choice for STL templates is the std::array template. It provides you all the same benefits of a C-style array.

double 2dmatrix[10][10];

// would become

std::array<std::array<double, 10>, 10> 2dmatrix;

If you know the sizes beforehand and the performance is a bottleneck - use std::array from C++11. It's performance is exactly the same as of C-style arrays because internally it looks like

template<typename T, int N>
struct array {
  T _data[N];
};

This is a preffered way to use stack-allocated arrays in modern C++. Never use C-style arrays if you have a modern compiler.

People are going to say "It depends on what you're doing".

And they are right.

There's an example here where a conventionally-designed program using std::vector was performance tuned, through a series of six stages, and its execution time was reduced from 2700 microseconds per unit of work to 3.7, for a speedup factor of 730x.

The first thing done was to notice that a large percentage of time was going into growing arrays and removing elements from them. So a different array class was used, which reduced time by a large amount.

The second thing done was to notice that a large percentage of time was still going into array-related activities. So the arrays were eliminated altogether, and linked lists used instead, producing another large speedup.

Then other things were using a large percentage of the remaining time, such as newing and deleteing objects. Then those objects were recycled in free lists, producing another large speedup. After a couple more stages, a decision was made to stop trying, because it was getting harder to find things to improve, and the speedup was deemed sufficient.

The point is, don't just sort of choose something that's highly recommended and then hope for the best. Rather get it built one way or another and then do performance tuning like this, and be willing to make major changes in your data structure design, based on what you see a high percentage of time being spent on. And iterate it. You might change your storage scheme from A to B, and later from B to C. That's perfectly OK.

In scientific computing, bugs and sub-optimal code are specially frustrating because large amounts of data are incorrectly processed and precious time is wasted.

std::vector might be your bottleneck or your best performer depending on your knowledge of its inner workings. Pay special attention to reserve(), insert(), erase(); consider learning about alignment and processor caching if your program is threaded.

Think about the time you have to spend ensuring consistency -and later hunting for bugs- if you try to do all the memory management by yourself, particularly when you are progressively adding features to your software. At the end of the day, the overhead of std::vector will be the least of your problems.

For scientific computing you'd be much better off using a dedicated C++ matrix library, such as Armadillo. This not only gives you fast array processing, but also many linear algebra operations that have been thoroughly debugged.

Apart from performance reasons, using a dedicated C++ matrix library will also allow you to considerably reduce the verbosity of your code, make less mistakes, and hence speed up development. One example is that with a C++ matrix library you don't need to worry about memory management.

Lastly, if you really need to go low-level (ie. use memory directly via pointers), C++ allows you to "drop" down to the C level. In Armadillo this is done via the .memptr() member function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top