Why is malloc really non-deterministic? (Linux/Unix)

https://stackoverflow.com/questions/11251639

18-06-2021
|

Question

malloc is not guaranteed to return 0'ed memory. The conventional wisdom is not only that, but that the contents of the memory malloc returns are actually non-deterministic, e.g. openssl used them for extra randomness.

However, as far as I know, malloc is built on top of brk/sbrk, which do "return" 0'ed memory. I can see why the contents of what malloc returns may be non-0, e.g. from previously free'd memory, but why would they be non-deterministic in "normal" single-threaded software?

Is the conventional wisdom really true (assuming the same binary and libraries)
If so, Why?

Edit Several people answered explaining why the memory can be non-0, which I already explained in the question above. What I'm asking is why the program using the contents of what malloc returns may be non-deterministic, i.e. why it could have different behavior every time it's run (assuming the same binary and libraries). Non-deterministic behavior is not implied by non-0's. To put it differently: why it could have different contents every time the binary is run.

Solution

I think that the assumption that it is non-deterministic is plain wrong, particularly as you ask for a non-threaded context. (In a threaded context due to scheduling alea you could have some non-determinism).

Just try it out. Create a sequential, deterministic application that

does a whole bunch of allocations
fills the memory with some pattern, eg fill it with the value of a counter
free every second of these allocations
newly allocate the same amount
run through these new allocations and register the value of the first byte in a file (as textual numbers one per line)

run this program twice and register the result in two different files. My idea is that these files will be identical.

OTHER TIPS

Malloc does not guarantee unpredictability... it just doesn't guarantee predictability.

E.g. Consider that

 return 0;

Is a valid implementation of malloc.

The initial values of memory returned by malloc are unspecified, which means that the specifications of the C and C++ languages put no restrictions on what values can be handed back. This makes the language easier to implement on a variety of platforms. While it might be true that in Linux malloc is implemented with brk and sbrk and the memory should be zeroed (I'm not even sure that this is necessarily true, by the way), on other platforms, perhaps an embedded platform, there's no reason that this would have to be the case. For example, an embedded device might not want to zero the memory, since doing so costs CPU cycles and thus power and time. Also, in the interest of efficiency, for example, the memory allocator could recycle blocks that had previously been freed without zeroing them out first. This means that even if the memory from the OS is initially zeroed out, the memory from malloc needn't be.

The conventional wisdom that the values are nondeterministic is probably a good one because it forces you to realize that any memory you get back might have garbage data in it that could crash your program. That said, you should not assume that the values are truly random. You should, however, realize that the values handed back are not magically going to be what you want. You are responsible for setting them up correctly. Assuming the values are truly random is a Really Bad Idea, since there is nothing at all to suggest that they would be.

If you want memory that is guaranteed to be zeroed out, use calloc instead.

Hope this helps!

malloc is defined on many systems that can be programmed in C/C++, including many non-UNIX systems, and many systems that lack operating system altogether. Requiring malloc to zero out the memory goes against C's philosophy of saving CPU as much as possible.

The standard provides a zeroing cal calloc that can be used if you need to zero out the memory. But in cases when you are planning to initialize the memory yourself as soon as you get it, the CPU cycles spent making sure the block is zeroed out are a waste; C standard aims to avoid this waste as much as possible, often at the expense of predictability.

Memory returned by mallocis not zeroed (or rather, is not guaranteed to be zeroed) because it does not need to. There is no security risk in reusing uninitialized memory pulled from your own process' address space or page pool. You already know it's there, and you already know the contents. There is also no issue with the contents in a practical sense, because you're going to overwrite it anyway.

Incidentially, the memory returned by malloc is zeroed upon first allocation, because an operating system kernel cannot afford the risk of giving one process data that another process owned previously. Therefore, when the OS faults in a new page, it only ever provides one that has been zeroed. However, this is totally unrelated to malloc.

(Slightly off-topic: The Debian security thing you mentioned had a few more implications than using uninitialized memory for randomness. A packager who was not familiar with the inner workings of the code and did not know the precise implications patched out a couple of places that Valgrind had reported, presumably with good intent but to desastrous effect. Among these was the "random from uninitilized memory", but it was by far not the most severe one.)

Even in "normal" single-threaded programs, memory is freed and reallocated many times. Malloc will return to you memory that you had used before.

Even single-threaded code may do malloc then free then malloc and get back previously used, non-zero memory.

There is no guarantee that brk/sbrk return 0ed-out data; this is an implementation detail. It is generally a good idea for an OS to do that to reduce the possibility that sensitive information from one process finds its way into another process, but nothing in the specification says that it will be the case.

Also, the fact that malloc is implemented on top of brk/sbrk is also implementation-dependent, and can even vary based on the size of the allocation; for example, large allocations on Linux have traditionally used mmap on /dev/zero instead.

Basically, you can neither rely on malloc()ed regions containing garbage nor on it being all-0, and no program should assume one way or the other about it.

The simplest way I can think of putting the answer is like this:

If I am looking for wall space to paint a mural, I don't care whether it is white or covered with old graffiti, since I'm going to prime it and paint over it. I only care whether I have enough square footage to accommodate the picture, and I care that I'm not painting over an area that belongs to someone else.

That is how malloc thinks. Zeroing memory every time a process ends would be wasted computational effort. It would be like re-priming the wall every time you finish painting.

There is an whole ecosystem of programs living inside a computer memmory and you cannot control the order in which mallocs and frees are happening.

Imagine that the first time you run your application and malloc() something, it gives you an address with some garbage. Then your program shuts down, your OS marks that area as free. Another program takes it with another malloc(), writes a lot of stuff and then leaves. You run your program again, it might happen that malloc() gives you the same address, but now there's different garbage there, that the previous program might have written.

I don't actually know the implementation of malloc() in any system and I don't know if it implements any kind of security measure (like randomizing the returned address), but I don't think so.

It is very deterministic.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow