C allocation and memory overhead

Question 1

It may surprise younger developers to learn that greying old ones like myself used to write in C on systems with 1 or 2k of RAM.

In systems this size, dynamic memory allocation would have been a luxury that we could not afford. It's not just the pointer overhead of managing the free store, but also the effects of free store fragmentation making memory allocation inefficient and very probably leading to a fatal out-of-memory condition (virtual memory was not an option).

So we used to use static memory allocation (i.e. global variables), kept a very tight control on the depth of function all nesting, and an even tighter control over nested interrupt handling.

When writing on these systems, I didn't even link the standard library. I wrote my own C startup routines and provided custom minimal I/O routines.

One program I wrote in a 2k ram system used the lower part of RAM as the data area and the upper part as the stack. In the final cut, I proved that the maximal use of stack reached so far down in memory that it was 1 byte away from the last variable in the data area.

Ah, the good old days...

EDIT:

To answer your question specifically, the original K&R free store manager used to add a header block to the beginning of each block of memory allocated via malloc.

The header block looked something like this:

union header {
    struct {
    union header   *ptr;
    unsigned        size;
    } s;
};

Where ptr is the address of the next header block and size is the size of the memory allocated (in blocks). The malloc function would actually return the address computed by &header + sizeof(header). The free function would subtract the size of the header from the pointer you provided in order to re-link the block back into the free list.

Question 2

There are several approaches how you can do that:

~~as you write, malloc() one memory block for every int you have. Completely inefficient, thus I strike it out.~~
malloc() an array of 100 ints. That needs in total 100*sizeof(int) + 1* sizeof(int*) + whatever malloc() internally needs. Much better.
statically allocate the array. Here you just need 100*sizeof(int).
allocate the array on the stack. That needs the same, but only for the current function call.

Which of these you need depends on how long you need the memory and other criteria.

If you have that few RAM, it might even be questionable if it is useful to use malloc() at all. It can be an option if several code blocks need a lot of RAM, but not at the same time.

On how the memory addresses are tracked, that as well depends:

for malloc(), you have to put the pointer in a place where you don't lose it.
for an array on the stack, it is expressed relatively to the current frame pointer. The code sets up the frame and thus knows about the offset, so it is normally not needed to store it anywhere.
for an array in the data segment, the compiler and linker know about the address and statically put the address where it is needed.

Question 3

If this is true, why is C favoured over assembler so often?

You're simplifying the problem too much. C or assembler - doesn't matter, you still need to manage the memory chunks. The main issue is fragmentation, not the actual management overhead. In a system like the one you described, you would probably just allocate the memory and never ever release it, thus no need to check what's free - whatever is below the watermark is free, and that's it.

Is this really how it works? So super inefficient?

There are many algorithms around this problem, but if you're simplifying - yes, it basically is. In reality its a much more complicated problem - and there are much more complicated systems revolving around servicing memory, dealing with fragmentation, garbage collection (on a OS level), etc etc.