boolean size not defined in java: why?

https://softwareengineering.stackexchange.com/questions/363286

25-01-2021
|

문제

I see size of boolean is not defined. Below are two statements I see at java primitive data size

not precisely defined

Further explanation says

boolean represents one bit of information, but its "size" isn't something that's precisely defined.

Question came to my mind was why boolean in java can't be represented with 1 bit(or 1 byte if byte is minimum representation ) ?

But I see it has been already answered at https://stackoverflow.com/questions/1907318/why-is-javas-boolean-primitive-size-not-defined where it says

the JVM uses a 32-bit stack cell, used to hold local variables, method arguments, and expression values. Primitives that are smaller than 1 cell are padded out, primitives larger than 32 bits (long and double) take 2 cells

Does it mean even byte/char/short primitiva data types also take 32 bit though their size is defined as 8/16/16 bit ?

Also can we say boolean size will be 32 bit on 32 bit cpu and 64 bit on 64 bit cpu ?

해결책

TL;DR The only thing that's sure is that boolean occupies at least one bit. Everything else depends on the JVM implementation.

The Java Language Specification doesn't define sizes, only value ranges (see The Language Spec). So, it's not only the boolean size that's undefined at this level. And boolean has two possible values: false and true.

The Virtual Machine Specification tells us that boolean variables are treated like int with values 0 and 1. Only arrays of boolean have specific support. So at the Virtual Machine level, a boolean variable occupies the same amount of space as an int, meaning one stack cell: at least 4 bytes, typically 4 bytes on 32-bit Java and 8 bytes on 64-bit.

Finally there's the HotSpot engine that compiles JVM bytecode into optimized CPU-specific machine code, and I bet that in many cases it's able to deduce the limited value-range of an int-masked boolean from the context and use a smaller size.

다른 팁

There are a number of concepts to tease apart:

the Java programming language itself, which is a textual programming language,
the Java Virtual Machine byte-code & class file format, which is a binary compiled encoding of original Java language source code, and is used as an interchange file format to store, load, and share java object code,
a particular Java Virtual Machine implementation, which could be an interpreter though is often instead a JIT-based implementation,
JIT generated machine code that runs directly on the hardware processor.

Java, the programming language, doesn't define a concept size of primitive types because (unlike C/C++) there is no sizeof operator : sizes are not observable via language constructs, so the language doesn't need to define them.

As @Ralf points out the Java language does define the range of the primitive types, which is very relevant to the programmer as these ranges can be observed via constructs within the language.

The language does define an instrumentation capability that allows inquiry into the size of an object, but (1) this requires instrumentation, (2) provides only an estimate, and (3) this inquiry does not apply to primitive types or local variables.

the JVM uses a 32-bit stack cell, used to hold local variables, method arguments, and expression values. Primitives that are smaller than 1 cell are padded out, primitives larger than 32 bits (long and double) take 2 cells

The padding quote speaks to details of the JVM class file format, which is being used as an interchange mechanism (as distinct from the Java language and a JVM implementation). Though what it says holds for the abstract machine and JVM byte code, it does not necessarily have to hold for the JIT'ed machine code.

The padding quote also restricts itself to discussion of local variables/parameters/expressions that are typically stack allocated (e.g. auto or automatics in C/C++), and does not discuss object/arrays.

The actual size of such automatic variables is almost never an issue (e.g. for performance or for space).

In part, this is because the underlying hardware CPUs more naturally works on larger bit sizes (like 32 or 64) rather than 1-bit. Even 8 or 16 bit sizes are generally no faster than 32, and sometimes 8-bit handling requires an extra instruction or two to work with the hardware instruction set's wider registers.

And another reason is limited usage of local variables — they are used directly by code and only by code, and thus not really subject to scaling issues — in particular, as compared to objects and arrays, which are used by data structures of potentially any scale.

(We might consider recursion as scaling of local variables, so a larger local variables in recursive routines risks stack overflow sooner.)

However, sizes of objects can matter a lot, if the count of instances is high, and also, sizes of array elements can matter if having high number of elements.

Does it mean even byte/char/short primitiva data types also take 32 bit though their size is defined as 8/16/16 bit ?

For locals, maybe, maybe not depending on the JIT.

For objects, within the JVM byte code & class file mechanism, the fields are directly access by their identification and there is no notion given of "cells" — whereas there is with the (local and parameter) variables.

A JVM implementation (including its JIT) has the flexibility to rearrange field order within implementation (e.g. at the machine code level) so two 16-bit fields can occupy the same 32-bit word even if they were not declared adjacently in the source code; this reduces overhead caused by padding required to maintain alignment. Any such alighment, padding, and field placement are also very much JVM implementation-specific rather than JVM interchange format concerns. In theory, the JIT could pack booleans down to one-bit in an array, or pack 8 individual boolean fields into a single byte in an object. That most don't is a JVM implementation choice.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange