How does jemalloc work? What are the benefits?

https://stackoverflow.com/questions/1624726

06-07-2019
|

Question

Firefox 3 came with a new allocator: jemalloc.

I have heard at several places that this new allocator is better. The top Google results don't gave any further information though and I am interested in how exactly it works.

Solution

jemalloc first appeared for FreeBSD, the brainchild of one "Jason Evans", hence the "je". I would ridicule him for being egotistical had I not once written an operating system called paxos :-)

See this PDF for full details. It's a white paper describing in detail how the algorithms work.

The main benefit is scalability in multi-processor and multi-threaded systems achieved, in part, by using multiple arenas (the chunks of raw memory from which allocations are made).

In single-threaded situations, there is no real benefit to multiple arenas so a single arena is used.

However, in multi-threaded situations, many arenas are created (four times as many arenas as there are processors), and threads are assigned to these arenas in a round-robin fashion.

This means that lock contention can be reduced since, while multiple threads may call malloc or free concurrently, they'll only contend if they share the same arena. Two threads with different arenas will not affect each other.

In addition, jemalloc tries to optimise for cache locality since the act of fetching data from RAM is much slower than using data already in the CPU caches (no different in concept to the difference between fast fetching from RAM versus slow fetching from disk). To that end, it first tries to minimise memory usage overall since that is more likely to ensure the application's entire working set is in cache.

And, where that can't be achieved, it tries to ensure that allocations are contiguous, since memory allocated together tends to be used together.

From the white paper, these strategies seem to give similar performance to current best algorithms for single threaded use while offering improvements for multi-threaded usage.

OTHER TIPS

There is one intersting source: the C-source itself: http://mxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c

In the beginning, a short summary describes how it works roughly. Though, a more depth algorithm analysis is missing.

As for what benefits jemalloc brought to mozilla, per http://blog.pavlov.net/2008/03/11/firefox-3-memory-usage/ (also first google result for mozilla+jemalloc):

[...]concluded that jemalloc gave us the smallest amount of fragmentation after running for a long period of time. [...] Our automated tests on Windows Vista showed a 22% drop in memory usage when we turned jemalloc on.

Aerospike implemented jemalloc back in a private branch in 2013. In 2014, it was incorporated into Aerospike 3.3. Psi Mankoski just wrote about Aerospike's implementation, plus when and how to effectively use jemalloc, for High Scalability.

jemalloc really helped Aerospike take advantage of modern multithreaded, multi-CPU, multi-core computer architectures. There are also some very important debugging capabilities built in to jemalloc to manage arenas. The debugging allowed Psi to be able to tell, for instance, what was a true memory leak, versus what was the result of memory fragmentation. Psi also discusses how thread cache and per-thread allocation provided an overall performance (speed) improvement.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow