Question

In linux, is calloc exactly the same as malloc + memset or does this depend on the exact linux/kernel version?

I am particularly interested in the question of whether you can calloc more RAM than you physically have (as you can certainly malloc more RAM than you physically have, you just can't write to it). In other words, does calloc always actually write to the memory you have been allocated as the specs suggest it should.

Was it helpful?

Solution

Of course, that depends on the implementation, but on a modern day Linux, you probably can. Easiest way is to try it, but I'm saying this based on the following logic.

You can malloc more than the memory you have (physical + virtual) because the kernel delays allocation of your memory until you actually use it. I believe that's to increase the chances of your program not failing due to memory limits, but that's not the question.

calloc is the same as malloc but zero initializes the memory. When you ask Linux for a page of memory, Linux already zero-initializes it. So if calloc can tell that the memory it asked for was just requested from the kernel, it doesn't actually have to zero initialize it! Since it doesn't, there is no access to that memory and therefore it should be able to request more memory than there actually is.

As mentioned in the comments this answer provides a very good explanation.

OTHER TIPS

Whether calloc needs to write to the memory depends on whether it got the allocation from heap pages that are already assigned to the process, or it had to request more memory be assigned to the process by the kernel (using a system call such as sbrk() or mmap()). When the kernel assigns new memory to a process, it always zeroes it first (typically using a VM optimization, so it doesn't actually have to write to the page). But if it's reusing memory that was assigned previously, it has to use memset() to zero it.

It is not mentioned in the cited duplicate or here. Linux uses virtual memory and can allocate more memory that physically available in the system. A naive implementation of calloc() that simply does a malloc() plus memset() in user space will touch every page.

As Linux typically allocates in 4k chunks, all of the calloc() blocks are the same and initially read as zero. That is the same 4k chunk of memory can be mapped read only and the entire calloc() space in only taking up approximately size/4k * pointer_size + 4k. As the program writes to the calloc() space, a page fault happens and Linux will allocate a new page (4k) and resume the program.

This is called copy-on-write or COW for short. malloc() will generally behave the same way. For small sizes, the 'C' library will use binning and share 4k pages with other small sized allocation.

So, there are typically two layers involved.

  1. Linux kernel's process memory management.
  2. glibc heap management.

If the memory size requested is large and requires new memory allocated to the process, then most of the above applies (via Linux's process memory management). However, if the memory requested is small, then it will be like a malloc() plus memset(). In the large allocation size, the memset() is damaging as it touches the memory and the kernel thinks it needs a new page to allocate.

You can't malloc(3) more ram than the kernel gives the process doing the malloc(3)-ing. malloc(3) returns NULL if you can't allocate the amount of memory you want to allocate. In addition, malloc(3) and memset(3) are defined by your c library (libc.so) and not your kernel. The Linux kernel defines mmap(2) and other low-level memory allocation functions, not the *alloc(3) family (excluding kalloc()).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top