How is the "empty string" sequence represented under the hood in Java?

https://stackoverflow.com/questions/23658503

java
jvm

22-07-2023
|

Question

Throughout my career I've often seen calls like this:

if( "".equals(foo) ) { //do stuff };

How is the empty string understood in terms of data in the lower-levels of Java?

Specifically, by "Lower-levels of Java" I'm referring to the actual contents of memory or some C/C++ construct being used to represent the "" sequence, rather than high-level implementations in Java.

I had previously checked the Java Language Specification which lead me to this, and noting that the "empty string" wasn't really given much more definition than that, this is then what led to the head-scratching.

I then ran javap on some various classes trying to tease out an answer through bytecode, but the behavior in regards to "How is the machine dealing with the sequence "" wasn't really any more clear. Having then excluded byte code and Java code I then posted the question here, hoping that someone would shed some light on the issue from a lower-level perspective.

La solution

There's no such thing as "the empty string character". A character is always a UTF-16 code unit, and there's no "empty" code unit. There's "an empty string" which is represented exactly the same way as any other string:

A char[] reference
An index into that char[]
A length

In this case, the length would be 0. The char[] reference could potentially be a reference to an empty char array, which could potentially be shared between all instance of String which have a length of 0.

(Code such as substring could be implemented by detecting 0-length requests and always returning the same reference to an empty string, but I'm not aware of implementations doing that.)

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow