Question


This is NOT about whether primitives go to the stack or heap, it's about where they get saved in the actual physical RAM.


Take a simple example:

int a = 5;

I know 5 gets stored into a memory block.

My area of interest is where does the variable 'a' get stored?

Related Sub-questions: Where does it happen where 'a' gets associated to the memory block that contains the primitive value of 5? Is there another memory block created to hold 'a'? But that will seem as though a is a pointer to an object, but it's a primitive type involved here.

Était-ce utile?

La solution

To expound on Do Java primitives go on the Stack or the Heap? -

Lets say you have a function foo():

void foo() {
   int a = 5;
   system.out.println(a);
}

Then when the compiler compiles that function, it'll create bytecode instructions that leave 4 bytes of room on the stack whenever that function is called. The name 'a' is only useful to you - to the compiler, it just creates a spot for it, remembers where that spot is, and everywhere where it wants to use the value of 'a' it instead inserts references to the memory location it reserved for that value.

If you're not sure how the stack works, it works like this: every program has at least one thread, and every thread has exactly one stack. The stack is a continuous block of memory (that can also grow if needed). Initially the stack is empty, until the first function in your program is called. Then, when your function is called, your function allocates room on the stack for itself, for all of its local variables, for its return types etc.

When your function main call another function foo, here's one example of what could happen (there are a couple simplifying white lies here):

  • main wants to pass parameters to foo. It pushes those values onto the top of the stack in such a way that foo will know exactly where they will be put (main and foo will pass parameters in a consistent way).
  • main pushes the address of where program execution should return to after foo is done. This increments the stack pointer.
  • main calls foo.
  • When foo starts, it sees that the stack is currently at address X
  • foo wants to allocate 3 int variables on the stack, so it needs 12 bytes.
  • foo will use X + 0 for the first int, X + 4 for the second int, X + 8 for the third.
    • The compiler can compute this at compile time, and the compiler can rely on the value of the stack pointer register (ESP on x86 system), and so the assembly code it writes out does stuff like "store 0 in the address ESP + 0", "store 1 into the address ESP + 4" etc.
  • The parameters that main pushed on the stack before calling foo can also be accessed by foo by computing some offset from the stack pointer.
    • foo knows how many parameters it takes (say 3) so it knows that, say, X - 8 is the first one, X - 12 is the second one, and X - 16 is the third one.
  • So now that foo has room on the stack to do its work, it does so and finishes
  • Right before main called foo, main wrote its return address on the stack before incrementing the stack pointer.
  • foo looks up the address to return to - say that address is stored at ESP - 4 - foo looks at that spot on the stack, finds the return address there, and jumps to the return address.
  • Now the rest of the code in main continues to run and we've made a full round trip.

Note that each time a function is called, it can do whatever it wants with the memory pointed to by the current stack pointer and everything after it. Each time a function makes room on the stack for itself, it increments the stack pointer before calling other functions to make sure that everybody knows where they can use the stack for themselves.

I know this explanation blurs the line between x86 and java a little bit, but I hope it helps to illustrate how the hardware actually works.

Now, this only covers 'the stack'. The stack exists for each thread in the program and captures the state of the chain of function calls between each function running on that thread. However, a program can have several threads, and so each thread has its own independent stack.

What happens when two function calls want to deal with the same piece of memory, regardless of what thread they're on or where they are in the stack?

This is where the heap comes in. Typically (but not always) one program has exactly one heap. The heap is called a heap because, well, it's just a big ol heap of memory.

To use memory in the heap, you have to call allocation routines - routines that find unused space and give it to you, and routines that let you return space you allocated but are no longer using. The memory allocator gets big pages of memory from the operating system, and then hands out individual little bits to whatever needs it. It keeps track of what the OS has given to it, and out of that, what it has given out to the rest of the program. When the program asks for heap memory, it looks for the smallest chunk of memory that it has available that fits the need, marks that chunk as being allocated, and hands it back to the rest of the program. If it doesn't have any more free chunks, it could ask the operating system for more pages of memory and allocate out of there (up until some limit).

In languages like C, those memory allocation routines I mentioned are usually called malloc() to ask for memory and free() to return it.

Java on the other hand doesn't have explicit memory management like C does, instead it has a garbage collector - you allocate whatever memory you want, and then when you're done, you just stop using it. The Java runtime environment will keep track of what memory you've allocated, and will scan your program to find out if you're not using all of your allocations any more and will automatically deallocate those chunks.

So now that we know that memory is allocated on the heap or the stack, what happens when I create a private variable in a class?

public class Test {
     private int balance;
     ...
}

Where does that memory come from? The answer is the heap. You have some code that creates a new Test object - Test myTest = new Test(). Calling the java new operator causes a new instance of Test to be allocated on the heap. Your variable myTest stores the address to that allocation. balance is then just some offset from that address - probably 0 actually.

The answer at the very bottom is all just .. accounting.

...

The white lies I spoke about? Let's address a few of those.

  • Java is first a computer model - when you compile your program to bytecode, you're compiling to a completely made-up computer architecture that doesn't have registers or assembly instructions like any other common CPU - Java, and .Net, and a few others, use a stack-based processor virtual machine, instead of a register-based machine (like x86 processors). The reason is that stack based processors are easier to reason about, and so its easier to build tools that manipulate that code, which is especially important to build tools that compile that code to machine code that will actually run on common processors.

  • The stack pointer for a given thread typically starts at some very high address and then grows down, instead of up, at least on most x86 computers. That said, since that's a machine detail, it's not actually Java's problem to worry about (Java has its own made-up machine model to worry about, its the Just In Time compiler's job to worry about translating that to your actual CPU).

  • I mentioned briefly how parameters are passed between functions, saying stuff like "parameter A is stored at ESP - 8, parameter B is stored at ESP - 12" etc. This generally called the "calling convention", and there are more than a few of them. On x86-32, registers are sparse, and so many calling conventions pass all parameters on the stack. This has some tradeoffs, particularly that accessing those parameters might mean a trip to ram (though cache might mitigate that). x86-64 has a lot more named registers, which means that the most common calling conventions pass the first few parameters in registers, which presumably improves speed. Additionally, since the Java JIT is the only guy that generates machine code for the entire process (excepting native calls), it can choose to pass parameters using any convention it wants.

  • I mentioned how when you declare a variable in some function, the memory for that variable comes from the stack - that's not always true, and it's really up to the whims of the environment's runtime to decide where to get that memory from. In C#/DotNet's case, the memory for that variable could come from the heap if the variable is used as part of a closure - this is called "heap promotion". Most languages deal with closures by creating hidden classes. So what often happens is that the method local members that are involved in closures are rewritten to be members of some hidden class, and when that method is invoked, instead allocate a new instance of that class on the heap and stores its address on the stack; and now all references to that originally-local variable occur instead through that heap reference.

Autres conseils

I think I got the point that you do not mean to ask whether data is store in heap or stack! we have the same puzzle about this!

The question you asked is highly related with programming language and how operating system deal with process and variables.

That is very interesting because when I were in my university studying C and C++, I encounter the same question as you. after reading some ASM code compiled by GCC, I have little bit of comprehension with this, let's discuss about it, if any problem, please comment it and let me learn more about it.

In my opinion, the variable name will not be stored and variable value are stored in, because in ASM code, there is no real variable name except for cache name for short, all the so called variable is just an off set from stack or heap.
which I think is a hint for my learning, since ASM deal with variable name in this way, other language might have the same strategy.
They just store off set for real place for holding data.
let us make an example, say the variable name a is placed in address @1000 and the type of this a is integer, thus in memory address

addr  type     value
@1000 int      5    

which @1000 is the off set where the real data stored in.

as you can see that the data is put in the real off set for that.
In my understanding of process, that all the variable will be replaced by "address" of this "variable" at the beginning of a process, which means while CPU only deal with "address" that already allocated in memory.
let us review this procedure again: that you have defined
int a=5; print(a);
after compilation, the program is transfer into another format(all by my imagination) :

stack:0-4  int  5
print stack:0-4

while in the situation of process that real executing, I think the memory will be like this:

@2000 4  5     //allocate 4 byte from @2000, and put 5 into it
print @2000 4  //read 4 byte from @2000, then print

Since process's memory is allocated by CPU, the @2000 is an off set of this variable name, which means the name will be replaced by just an memory address, then will read data 5 from this address, and then execute the print command.

RETHINK

after completion of my writing, I found it rather hard to image by other people, we can discuss it if any problem or and mistake I have made.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top