Question

I am talking about single thread general purpose memory allocation/deallocation from a global 'heap' as e.g. every C programmer knows in the form of malloc()/free().

I can't recite the actual title of a paper about a much-better-but-alas-somewhat-restricted-in-usecase allocator, but I think I read the same judgement on several occasions. The community opinion about e.g. a balanced tree which has O(log n) for n = size of managed memory is that it is too slow. I can see the problem if we are talking about some high-frequency looping constructs, although I think that this pattern is a rather rare one. For the once in a while allocation of an object though, which is afterwards used in an O(n) algorithm (n = size of object in memory cells) the impact is rather minor, isn't it? Nevertheless, the public opinion about such allocators seems to be that they are way too primitive to be used, much less in an realtime environment.

Was it helpful?

Solution

The "ideal" performance of an algorithm in people's minds is dependent on what the best option is out there. If you can do a "find" operation on a data structure in O(n log N) time, is that fast enough? Maybe it is. However, for many structures you can do a find in O(n), or even O(log n), so people will call your O(n log n) "slow." This isn't because it's actually slow, but because there's faster algorithms that meet their needs.

In general, people have found algorithms that run faster than O(log n) which meet their expectations for memory management. Thus, any O(log n) algorithm is going to be marked as "too slow" by the general populous unless it offers some valuable feature to offset this speed change. You won't sell many sports cars with 50HP motors in them in a country where sportcars all have 250+HP, unless you can show some unique value of this 50HP sportscar. Perhaps it's cheaper, or it's biodegradable.

Most developers using these memory management algorithms do not want to have to think about the cost of the underlying API. If they even have to consider the asymptotic runtime performance of their memory management, then it's too slow for them. They'll reject it.

All of this means that real memory management libraries tend to have to model performance in a more nuanced way than simple Big-Oh notation. Big-Oh is nice, but for the kind of performance they are interested in, it's not good enough. Memory managers have to care about the actual time constants which govern how fast they run, which Big-Oh simply ignores. Their runtime analyses include things like branch mispredictions and jitter and non-random reads/writes in order to eek out a few percent more performance.

So in the end, if given the choice between a O(log n) algorithm, and a bleeding edge tuned algortihm that requires 3 cycles to allocate objects smaller than 64 bytes, and 20-40 cycles for larger objects, which one are you going to choose?

OTHER TIPS

"Slow" depends on the application. For some real-time applications (high speed machinery control, DSP, etc.), any non-bounded latency might be too slow, perhaps even leading to a catastrophic failure mode. For these applications, O(1) code is easier to verify as safe. O(logN) is only safe if N is strictly bounded and within requirements, and max(N) might be difficult to determine, verify, or test.

For instance, Objective C includes reference count memory management, yet Apple still recommends against the use of any methods that can allocate memory inside iOS/macOS real-time audio contexts. You don't want a live-performance music synth to pop $$$ worth of concert hall speakers.

I am talking about single thread general purpose memory allocation/deallocation from a global 'heap' as e.g. every C programmer knows in the form of malloc()/free().

I think the answer would be quite similar to what is used inside an OS, e.g. Linux kernel. The issue here is, is it slow? Of course not, O(log n) is fast. Is it always fast? It turns out some algorithms are fast for lookup but not so fast for insertion/deletion, and others are faster in insertion/deletion but not so fast for lookup.

For example, both AVL and Red-black tree are O(log n). To quote the kernel documentation:

Red-black trees are similar to AVL trees, but provide faster real-time bounded worst case performance for insertion and deletion (at most two rotations and three rotations, respectively, to balance the tree), with slightly slower (but still O(log n)) lookup time.

In the case where Red-black tree is used, there are significant amount of insert/delete operations, and that might not be the case for all applications.

That should give a good idea of how an algorithm should perform. The pinch of salt is, it depends on the application and the way to find it out is by profiling.

Licensed under: CC-BY-SA with attribution
scroll top