Question

I am profiling the garbage behavior of java.lang.String and it looks like each time you instantiate a string for the first time inside any class it always generate garbage. Would anyone know why?

public abstract class AbstractTest {

    protected static String SERGIO = "sergio";

    private String someValue;

    public void init() {
        this.someValue = new String(SERGIO);
    }
}

public class Test extends AbstractTest {

    private static String JULIA = "julia";

    private Runtime runtime = Runtime.getRuntime();
    private String anotherValue;
    private String yetAnother;

    private void gc() throws InterruptedException {
        System.gc();
        Thread.sleep(100);
    }

    private long usedMemory() {
        return runtime.maxMemory() - runtime.freeMemory();
    }

    public void test() throws Exception {
        gc();
        this.anotherValue = new String(SERGIO); // a bunch of garbage is created!
        long usedMemory = usedMemory();
        gc();
        long usedMemoryAfterGC = usedMemory();
        System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
        gc();
        this.yetAnother = new String(JULIA); // no more garbage
        usedMemory = usedMemory();
        gc();
        usedMemoryAfterGC = usedMemory();
        System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
    }

    public static void main(String[] args) throws Exception {
        Test t = new Test();
        t.test();
    }

Output:

Collected: 704336
Collected: 0

That 's fine. First time it creates garbage, then subsequent instantiations produce no garbage.

What is weird is when you force a string creation in the superclass, it still creates garbage in the subclass the first time you instantiate a String there:

public void test() throws Exception {
    gc();
    init(); // creates a String in the superclass
    gc();
    this.yetAnother = new String(JULIA);
    long usedMemory = usedMemory();
    gc();
    long usedMemoryAfterGC = usedMemory();
    System.out.println("Collected: " + (usedMemory - usedMemoryAfterGC));
}

Output:

Collected: 348648

Any idea why?

(By the way I am running this on MAC and JDK 1.6.0_37)

EDIT1: I changed the code slightly to make it clear that string internalization is not the culprit here, at least it does not look like it is.

EDIT2: If you change String to Object throughout the code, you get the same garbage so I guess it has to do with how object allocation through new happens in Java. The first time you allocate an object in a class you get the garbage. The second time you don't. Weird it is per class.

EDIT3: I wrote a blog article where I talk about how to force a GC to profile your applications for garbage creation, like I am doing in the code above.

Was it helpful?

Solution 2

After reading Peter's answer here it is clear that TLAB is the culprit. If you use the option -XX:-UseTLAB to disable TLAB the problem goes away. From what I understood from here, it looks like with TLAB a thread allocates a big chunk of memory initially to avoid race conditions later. I was able to prove that TLAB was the culprit by setting a bigger size for it with the -XX:TLABSize=64m and see this amount being allocated.

OTHER TIPS

A literal String inside a class will be "interned" on first reference (if not before). Interning will generally involve discarding the original version of the String and using the interned version, and a few more objects may be created and discarded in the process.

(And, of course, there's really no reliable way to detect how much garbage is being generated in any single operation, without special internal hooks, so your measurements are suspect at best.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top