Why is apr_palloc faster than malloc?

https://stackoverflow.com/questions/11968365

26-06-2021
|

Question

Reading through The Apache Modules Book, I come across this claim in part 3.4.3:

"A secondary benefit is that pool allocation is faster than malloc on most platforms!"

An open question, I realize, but.. well, why?

Solution

In addition to Lars' point about locality, pool allocation is simply a different speed/flexibility tradeoff than malloc.

malloc has to track each allocated block, be able to free blocks individually, handle free block fragmentation and coalescing, have some strategy for choosing which of several free blocks to allocate from, etc.

This is required for a general purpose allocator, but pools avoid all this complexity (and the associated space & runtime overhead) by making the caller decide some of this behaviour statically.

The Apache example seems to allow only whole-pool deallocation (so it's like an arena allocator). This means it doesn't need to track individual allocation records, their lifetimes, and a fragmented free store. It just needs to keep track of one (or more) big chunks of memory, and update a pointer to the next unused space. Note that even if these big chunks are allocated using malloc, this cost is typically amortized over many pool allocations.

Other pool types may be limited to objects of the same size (or type), which simplifies them still further; they can even keep a LIFO free list at close to zero cost, allowing per-record deallocation.

OTHER TIPS

Pool allocation strategy can lead to better performance in two ways. In applications where the data structure traversal matches the data allocation order, the pooling strategy can expose more regular strides.

For example, consider a search through a linked-list of records. Each record consists of a key field, a data field, and a next field pointing to the next record in the list. The key and next fields are accessed consecutively, until a match is found. It is only then that the data field is accessed. If the data stored in each record is itself a heap object, and worse, the data varies in size form record to the next, there are various undesirable effects on the spatial locality of the working set. Namely, data are unnecessarily fetched from memory, consuming valuable bandwidth.

Also, since variable length data intervene between the key and next fields, the strides within a record, and across successive records will appear irregular. Thus, the segregation of different object types can improve spatial locality, and concomitantly, overall performance.

Furthermore, because pool allocation may expose more regular strides, a simple prefetching strategy can anticipate future references and shorten their memory access latency.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow