Вопрос

I have the following function:

double 
neville (double xx, size_t n, const double *x, const double *y, double *work);

which performs Lagrange interpolation at xx using the n points stored in x and y. The work array has size 2 * n. Since this is polynomial interpolation, n is in the ballpark of ~5, very rarely more than 10.

This function is aggressively optimized, and supposed to be called in tight loops. Profiling suggests that heap allocating the work array in the loop is bad. Unfortunately, I'm supposed to package this into a function-like class, and clients must be unaware of the work array.

For now, I use a template integer argument for the degree and std::array to avoid dynamic allocation of the work array:

template <size_t n>
struct interpolator
{
    double operator() (double xx) const
    {
        std::array<double, 2 * n> work;
        size_t i = locate (xx); // not shown here, no performance impact
                                // due to clever tricks + nice calling patterns

        return neville (xx, n, x + i, y + i, work.data ());
    }        

    const double *x, *y;
};

It would have been possible to store the work array as a mutable member of the class, but operator() is supposed to be used concurrently by several threads. This version is OK provided you know n at compile time.

Now, I need the n parameter to be specified at run time. I am wondering about something like this:

double operator() (double xx) const
{
    auto work = static_cast<double*> (alloca (n * sizeof (double)));
    ...

Some bells ring when using alloca: I'm of course going to have a cap on n to avoid the alloca call to overflow (anyway it's quite stupid to use degree 100 polynomial interpolation).

I'm quite unconfortable with the approach however:

  • Am I missing some obvious danger of alloca ?
  • Is there a better way to avoid heap allocation here ?
Это было полезно?

Решение

I'm quite unconfortable with the approach however:

  • Am I missing some obvious danger of alloca ?

You pointed the one real danger out: stack overflow behaviour is undefined for alloca. In addition, alloca isn’t actually standardised. For instance, Visual C++ has _alloca instead, and GCC by default defines it as a macro. That problem can be circumvented fairly easily, however, by providing a thin wrapper around the few existing implementations.

  • Is there a better way to avoid heap allocation here ?

Not really. C++14 will have a (potentially!) stack allocated variable-length array type. But until then, and when you consider std::array not to be a good fit, go for alloca in cases such as yours.

Minor nitpick though: your code is missing a cast of the return value of alloca. It shouldn’t even compile.

Другие советы

There are always a bunch of notes to add to any use of stack memory. As you point out, stacks have finite size and fairly serious misbehavior when that space runs out. A stack overflow will hopefully crash if there are guard pages, but on some platforms and threading environments can sometimes be a silent corruption (bad) or security issue (worse).

Also remember that stack allocation is very fast compared to malloc, (it's just an subtraction from the stack pointer register). But the use of that memory may not be. The side effect of pushing your stack frame down by large amounts is that the cache lines of the leaf functions you are about to call are no longer resident. So any use of that memory needs to go out to the SMP environment to bring the cache lines back into an exclusive (in the MESI sense) state. The SMP bus is a much (!) more constrained environment than L1 cache, and if you are spamming your stack frames around this can be a real scalability issue.

Also, as far as syntax goes, note that both gcc and clang (and Intel's compiler I believe) support the C99 variable length array syntax as a C++ extension. You may not need to actually call the libc alloca() routine at all.

Finally, note that malloc really isn't that slow. If you're dealing with single buffers of the realm of dozens of kilobytes or bigger, the memory bandwidth required to service whatever work you are going to do on them is going to swamp any overhead from malloc.

Basically: alloca() is cute, and has its uses, but unless you have a benchmark ready to prove that you need it, you probably don't and should just stick with traditional allocation.

How about this:

double operator() (double xx) const
{
    double work_static[STATIC_N_MAX];
    double* work = work_static;
    std::vector<double> work_dynamic;

    if ( n > STATIC_N_MAX ) {
        work_dynamic.resize(n);
        work = &work_dynamic[0];
    }

    ///...

No non-portable features, exception-safe, and degrades gracefully when n is too large. Of course, you may make work_static a std::array, but I'm not sure what benefit you see in that.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top