Improving my redblack tree implementation

Question

The question of whether it is better to allocate one large block (and split it up on your own) versus allocating lots of small items applies to many situations. And there is not a one-size-fits-all answer for it. In general, though, it would probably be a little bit faster to allocate the large block. But the speedup (if any) may not be large. In my experience, doing the single large allocation typically is worth the effort and complexity in a highly concurrent system that makes heavy use of dynamic allocation. If you have a single-threaded application, my guess is that the allocation of each node makes up a very small cost of the insert operation.

Some general thoughts/comments:

Allocating a single large block (and growing it as needed) will generally use less memory overall. A typical general purpose allocator (e.g., malloc/free in C) has overhead with each allocation. So, for example, a small allocation request of 100 bytes might result in using 128 bytes.
In a memory constrained system with lots of memory fragmentation, it might not be possible to allocate a large block of memory and slice it up whereas multiple small allocations might still succeed.
Although allocating a large block reduces contention for synchronization at the allocator level (e.g., in malloc), it is still necessary to provide your own synchronization when grabbing a node from your own managed list/block (assuming a multi-threaded system). But then there likely has to be some synchronization associated with the insert of the node itself, so it could handled in that same operation.

Ultimately, you would need to test it and measure the difference. One simple thing you could do is just write a simple "throw-away" test that allocates the number of nodes you expect to be handling and just time how long it takes (and then possibly time the freeing of them too). This might give you some kind of ballpark estimate of the allocation costs.