JVM, the constant pool, the heap and the addresses

Question 1

The bytecode interpreted by JVM in not necessarily the same bytecode written in .class file. Many JVMs perform so-called bytecode rewriting on different stages of execution.

So does HotSpot JVM. When a class is initialized, HotSpot rewrites ldc bytecodes refering to String entries in the constant pool with JVM-specific fast_aldc bytecode which refers to objects (i.e. java.lang.String instances) in CP cache. When such fast_aldc bytecode is executed for the first time, JVM resolves the constant pool entry, creates a String in Java Heap and populates the CP cache with the reference to this String. Upon further executions of the same bytecode JVM will instantly get the reference from CP cache and push it to Java stack.

After the interpretation of ldc bytecode (or its rewritten form) the top-of-stack will contain a valid reference to an object in Java Heap. The same kind of reference is produced by new bytecode. So there is no need to distinguish reference types.

That's how interpreter works. Of course, after a method gets JIT-compiled, there is no more bytecodes, constant pool references etc. All of these are just abstractions. Just a model.

Question 2

First off, the entire bytecode format is just an abstraction provided by the VM. It does not necessarily have any resemblance to the actual representation of the code or memory at runtime.

Second off, the Constant Pool is a table of up to 65,535 entries that uses 16bit indexes. Since indexing the constant pool with a small index and category 1 type is such a common task, there is a special shorthand instruction for it - ldc.

The ldc instruction uses a single byte index so it is only usable for the first 255 entries. If you want to access entries above that, you need to use the two byte form, ldc_w. The situation is similar to other shorthand instructions, such as aload_3 vs aload 3 vs wide aload 3.

And again, that's all an abstraction. In practice the VM will convert the constant pool to a more friendly internal format and may compile actual pointers to its runtime location into the code. But that's just one possible implementation.