On the use and abuse of alloca

https://stackoverflow.com/questions/5807612

24-10-2019
|

سؤال

I am working on a soft-realtime event processing system. I would like to minimise as many calls in my code that have non-deterministic timing. I need to construct a message that consists of strings, numbers, timestamps and GUID's. Probably a std::vector of boost::variant's.

I have always wanted to use alloca in past code of a similar nature. However, when one looks into systems programming literature there are always massive cautions against this function call. Personally I can't think of a server class machine in the last 15 years that doesn't have virtual memory, and I know for a fact that the windows stack grows a virtual-memory page-at-a-time, so I assume Unices do as well. There is no brick wall here (anymore), the stack is just as likely to run out of space as the heap, so what gives ? Why aren't people going gaga over aloca ? I can think of many use-cases of responsible use of alloca (string processing anyone ?).

Anyhow, I decided to test the performance difference (see below) and there is a 5-fold speed difference between alloca and malloc (the test captures how I would use alloca). So, have things changed? Should we just throw caution to the wind and use alloca (wrapped in a std::allocator) whenever we can be absolutely certain of the lifetime of our objects ?

I am tired of living in fear !

Edit:

Ok so there are limits, for windows it is a link-time limit. For Unix it seems to be tunable. It seems a page-aligned memory allocator is in order :D Anyone know of a general purpose portable implementation :D ?

Code:

#include <stdlib.h>
#include <time.h>

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>

using namespace boost::posix_time;

int random_string_size()
{
    return ( (rand() % 1023) +1 );
}

int random_vector_size()
{
    return ( (rand() % 31) +1);
}

void alloca_test()
{
    int vec_sz = random_vector_size();

    void ** vec = (void **) alloca(vec_sz * sizeof(void *));    

    for(int i = 0 ; i < vec_sz ; i++)
    {
        vec[i] = alloca(random_string_size());     
    }
}

void malloc_test()
{
    int vec_sz = random_vector_size();

    void ** vec = (void **) malloc(vec_sz * sizeof(void *));    

    for(int i = 0 ; i < vec_sz ; i++)
    {
        vec[i] = malloc(random_string_size());     
    }

    for(int i = 0 ; i < vec_sz ; i++)
    {
        free(vec[i]); 
    }

    free(vec);
}

int main()
{
    srand( time(NULL) );
    ptime now;
    ptime after; 

    int test_repeat = 100; 
    int times = 100000;


    time_duration alloc_total;
    for(int ii=0; ii < test_repeat; ++ii)
    { 

        now = microsec_clock::local_time();
        for(int i =0 ; i < times ; ++i)
        {
            alloca_test();    
        }
        after = microsec_clock::local_time();

        alloc_total += after -now;
    }

    std::cout << "alloca_time: " << alloc_total/test_repeat << std::endl;

    time_duration malloc_total;
    for(int ii=0; ii < test_repeat; ++ii)
    {
        now = microsec_clock::local_time();
        for(int i =0 ; i < times ; ++i)
        {
            malloc_test();
        }
        after = microsec_clock::local_time();
        malloc_total += after-now;
    }

    std::cout << "malloc_time: " << malloc_total/test_repeat << std::endl;
}

output:

hassan@hassan-desktop:~/test$ ./a.out 
alloca_time: 00:00:00.056302
malloc_time: 00:00:00.260059
hassan@hassan-desktop:~/test$ ./a.out 
alloca_time: 00:00:00.056229
malloc_time: 00:00:00.256374
hassan@hassan-desktop:~/test$ ./a.out 
alloca_time: 00:00:00.056119
malloc_time: 00:00:00.265731

--Edit: Results on home machine, clang, and google perftools--

G++ without any optimization flags
alloca_time: 00:00:00.025785
malloc_time: 00:00:00.106345


G++ -O3
alloca_time: 00:00:00.021838
cmalloc_time: 00:00:00.111039


Clang no flags
alloca_time: 00:00:00.025503
malloc_time: 00:00:00.104551

Clang -O3 (alloca become magically faster)
alloca_time: 00:00:00.013028
malloc_time: 00:00:00.101729

g++ -O3 perftools
alloca_time: 00:00:00.021137
malloc_time: 00:00:00.043913

clang++ -O3 perftools (The sweet spot)
alloca_time: 00:00:00.013969
malloc_time: 00:00:00.044468

المحلول

Well first of all, even though there is a lot of virtual memory doesn't mean your process will be allowed to fill it. On *nix there are stack size limits, whereas the heap is a lot more forgiving.

If you're only going to be allocating a few hundred / thousand bytes, sure go ahead. Anything beyond that is going to depend on what limits (ulimit) are in place on any given system, and that's just a recipe for disaster.

Why is the use of alloca() not considered good practice?

On my development box at work (Gentoo) I have a default stack size limit of 8192 kb. That's not very big, and if alloca overflows the stack then the behavior is undefined.

نصائح أخرى

I think you need to be a little bit careful in understanding what alloca actually is. Unlike malloc which goes to the heap, searches through buckets and linked lists of various buffers, alloca simply takes your stack register (ESP on x86) and moves it to create a "hole" on your thread's stack where you can store whatever you want. That's why it's uber-fast, just one (or few) assembly instruction.

So as others pointed out, it's not the "virtual memory" that you need to worry about but the size reserved for the stack. Although others limit themselves to "few hundred bytes", as long as you know your application and careful about it, we've allocated up to 256kb without any problems (default stack size, at least for visual studio, is 1mb and you can always increase it if you need to).

Also you really can't use alloca as a general purpose allocator (i.e. wrapping it inside another function) because whatever memory alloca allocates for you, that memory will be gone when the stack frame for current function is popped (i.e. when function exits).

I've also seen some people say that alloca is not completely cross-platform compatible, but if you are writing a specific application for a specific platform and you have the option of using alloca, sometimes it's the best option you have, as long as you understand the implications of increasing stack usage.

Firstly, it's because alloca memory is very hard to control. It's untyped, dies at the earliest opportunity, which makes it not very helpful. In addition, alloca has some unfortunate side effects, and those side effects are that regular stack variables now have to be dynamically indexed instead of constants, which can affect your performance in even basic operations accessing them and consumes register/stack space to store the dynamic offsets. This means that the real cost of using alloca isn't recorded in just the time it takes for the function to return. In addition, stack memory is very limited compared to heap memory- on Windows the stack limit is 8MB by default I believe, whereas the heap can be nearly the entire user address space. More than that, ultimately, whatever data you want to return has to be on the heap, so you may as well just use that as working space.

One point that has not been made afai can see is that the stack is often contiguous, while the heap is not. It's not in general true to say that the stack is as likely to run out of memory as the heap.

In C++, it's very common to see object instances declared as locals, which is sort of like an alloca but of structured memory rather than a block of N bytes - maybe you can think of this as a homage to your main point, which is that greater use of stack-based memory is a good idea. I'd sooner do that (declare an object instance as an RAII local) than use malloc (or alloca) in a C++ program. All those free calls to make exception-safe...

This generally assumes that the scope of the object is confined to this function and its called functions. If that's not the case then using stack-based memory is usually not a good idea anyway.

The windows stack does not grow - it's reserved size is set at link time, but the pages within this size will only be committed as needed. See http://msdn.microsoft.com/en-us/library/ms686774%28v=vs.85%29.asp. As the default reserved size is 1Mb, you could easily exceed this when using alloca().

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow