Comparison of String and StringBuilder manipulation in terms of memory usage

Question 1

The key is the expandCapacity function:

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

By resizing the underlying array by a factor of 2 every time a resize is needed, the amortized time needed to append 1 character is minimized.

Wikipedia has a good explanation:

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis.

Now for your particular example it would not make a difference, but you will see the effects when appending lots of characters:

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

Output:

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

Also, I counted the number of times the Arrays.copyOf() method is called inside extendCapacity() for the benchmark: It's 49 times on the first iteration, but only 15 times on the second and third iterations. The output is as follows:

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms

Question 2

It is only more efficient if you are looping to create your strings. If you have a loop that does:

String[] strings = { "a", "b", "c", "d" };
String result = "";
for( String s : strings) {
    result += s;
}

The StringBuilder version will generate less objects and cause less GC:

String[] strings = { "a", "b", "c", "d" };
StringBuilder builder = new StringBuilder();
for( String s : strings) {
    builder.append(s);
}

While the first one will cause one object to be sent for GC on every loop run, the second one won't.

Eventually, since the string builder array grows by doubling it's size, you won't have many allocations happening.

Question 3

Manipulation is more than just concatenating. Imagine you want to insert a character in the middle of a String. How would you do that, since Strings are immutable? You would have to create a new String. With StringBuilder you can insert(int offset, c)

See StringBuilder javadoc

You have method like

delete(int start, int end)
// Removes the characters in a substring of this sequence.

replace(int start, int end, String str)
// Replaces the characters in a substring of this sequence with characters in the specified String.

reverse()
// Causes this character sequence to be replaced by the reverse of the sequence.

insert(int dstOffset, CharSequence s)
// Inserts the specified CharSequence into this sequence.

Question 4

How is the usage of StringBuilder more efficient than normal String class for string manipulations?

It's vastly more efficient when you're performing many manipulations in a loop. Consider any string conversion or replacement function that needs to iterate over individual characters, such as this one to escape <, >, &, ", ' characters for XML or HTML:

public static String xmlEscape(String s) {
    StringBuilder sb = new StringBuilder(
        (int)Math.min(Integer.MAX_VALUE, s.length() * 5L / 4));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == '<') sb.append("&lt;");
        else if (c == '>') sb.append("&gt;");
        else if (c == '&') sb.append("&amp;");
        else if (c == '"') sb.append("&quot;");
        else if (c == '\'') sb.append("&#039;");
        else sb.append(c);
    }
    return sb.toString();
}

The StringBuilder array is initially sized with a capacity a bit bigger than the input string in order to accommodate the original text plus likely replacements. The output text accumulates in that pre-allocated buffer and it most likely won't need any additional memory allocation during the loop.

If the above function accumulated the output in a String instead of a StringBuilder, it would duplicate the entire output again every single time a single character was processed, degrading it to quadratic (i.e., awful!) performance.

To the second question:

Is the statement given in SCJP Guide is wrong?

To be blunt, yes. It's extremely misleading to say that there will be "abandoned String objects in the String pool". As far as I know, the terminology "String pool" refers only to the intern pool such as used by the String.intern() method. The only time Strings are automatically put in the intern pool is when the ClassLoader loads a class and loads String literal constants from the source code into memory.

Manipulating String objects at run time certainly does not put extra objects in the intern pool (unless you call .intern() on purpose).

What the SCJP Guide should say is:

String objects are immutable, so if you choose to do a lot of manipulations with String objects, you will end up with a lot of abandoned String objects in the heap.

Abandoned objects on the heap aren't the biggest problem though, because the garbage collector will swiftly eat them. The real reason to use StringBuilders when doing multiple manipulations is to avoid unnecessary copying of the characters in the first place. That makes a massive difference to the performance as shown in @jmiserez' benchmark.