Pergunta

According to SCJP Study Guide by KathySierra:

The java.lang.StringBuffer and java.lang.StringBuilder classes should be used when you have to make modifications to strings of characters. As we discussed, String objects are immutable, so if you choose to do a lot of manipulations with String objects, you will end up with a lot of abandoned String objects in the String pool

To clear out this, I have gone through the code of String class and StringBuilder source here.

The simplfied code of String looks like this:

public final class String(){
     private final char [] value; //Final helps in immutability, I guess.
     public String (String original){
         value = original.value;
      }
}

And StringBuilder's simplified version look like this:

public final class StringBuilder{
    char [] value;
    public StringBuilder(String str) {
        value = new Char[(str.length() + 16)]; // 16 here is implementation dependent.
    append(str);
}

public StringBuilder append(String str){

            //Add 'str' characters in value array if its size allows,
        else
            // Create new array of size newCapacity and copy contents of 'value' in that.
            //value = Arrays.copyOf(value, newCapacity);// here old array object is lost.

        return this;
    }
}

So let's say we have a case as under:

Using String class:

String s1 = "abc"; // Creates one object on String pool.
s1 = s1+"def"; // Creates two objects - "def " on String pool
// and "abcdef" on the heap.

If I use StringBuilder, the code becomes:

 StringBuilder s1 = StringBuilder("abc");

 // Creates one String object "abc " on String pool.
 // And one StringBuilder object "abc" on the heap.
 s1.append("def");
 // Creates one string object "def" on String pool.
 // And changes the char [] inside StringBuilder to hold "def" also.

In StringBuilder s2 = s1.append("def"); there are equal chances that the char array holding the string will be replaced by a new char array. The old array is reference less now and will be garbage collected.

My Query is:

Using simple String concatenation and StringBuilder append() method, the number of String objects that go on to the String pool is same.

And according to the code listed above, StringBuilder does make use of bigger char arrays in the first place while the String object contains a char array of the same length as the string it is holding.

  1. How is the usage of StringBuilder more efficient than normal String class for string manipulations?
  2. And is the statement given in SCJP Guide wrong?
Foi útil?

Solução

The key is the expandCapacity function:

void expandCapacity(int minimumCapacity) {
    int newCapacity = (value.length + 1) * 2; //important part here
    if (newCapacity < 0) {
        newCapacity = Integer.MAX_VALUE;
    } else if (minimumCapacity > newCapacity) {
        newCapacity = minimumCapacity;
    }
    value = Arrays.copyOf(value, newCapacity);
}

By resizing the underlying array by a factor of 2 every time a resize is needed, the amortized time needed to append 1 character is minimized.

Wikipedia has a good explanation:

As n elements are inserted, the capacities form a geometric progression. Expanding the array by any constant proportion ensures that inserting n elements takes O(n) time overall, meaning that each insertion takes amortized constant time. The value of this proportion a leads to a time-space tradeoff: the average time per insertion operation is about a/(a−1), while the number of wasted cells is bounded above by (a−1)n. The choice of a depends on the library or application: some textbooks use a = 2, but Java's ArrayList implementation uses a = 3/2 and the C implementation of Python's list data structure uses a = 9/8.

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity. This threshold must be strictly smaller than 1/a in order to support mixed sequences of insertions and removals with amortized constant cost.

Dynamic arrays are a common example when teaching amortized analysis.

Now for your particular example it would not make a difference, but you will see the effects when appending lots of characters:

public static void main(String[] args){
    int numAppends = 200000;
    int numRepetitions = 3;
    long[] time1 = new long[numRepetitions];
    long[] time2 = new long[numRepetitions];
    long now;
    for (int k = 0; k < numRepetitions; k++){
        String s = "";
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            s = s + "a";
        }
        time1[k] = (System.nanoTime() - now) / 1000000;
        StringBuilder sb = new StringBuilder();
        now = System.nanoTime();
        for(int i = 0; i < numAppends ; i++) {
            sb.append("a");     
        }
        time2[k] = (System.nanoTime() - now) / 1000000;
        System.out.println("Rep "+k+", time1: "+time1[k]+ " ms, time2: " + time2[k] + " ms");
    }
}

Output:

Rep 0, time1: 13509 ms, time2: 7 ms
Rep 1, time1: 13086 ms, time2: 1 ms
Rep 2, time1: 13167 ms, time2: 1 ms

Also, I counted the number of times the Arrays.copyOf() method is called inside extendCapacity() for the benchmark: It's 49 times on the first iteration, but only 15 times on the second and third iterations. The output is as follows:

newCapacity: 34
newCapacity: 70
newCapacity: 142
newCapacity: 286
newCapacity: 574
newCapacity: 1150
newCapacity: 2302
newCapacity: 4606
newCapacity: 9214
newCapacity: 18430
newCapacity: 36862
newCapacity: 73726
newCapacity: 147454
newCapacity: 294910
newCapacity: 42
Rep 2, time1: 12955 ms, time2: 134 ms

Outras dicas

It is only more efficient if you are looping to create your strings. If you have a loop that does:

String[] strings = { "a", "b", "c", "d" };
String result = "";
for( String s : strings) {
    result += s;
}

The StringBuilder version will generate less objects and cause less GC:

String[] strings = { "a", "b", "c", "d" };
StringBuilder builder = new StringBuilder();
for( String s : strings) {
    builder.append(s);
}

While the first one will cause one object to be sent for GC on every loop run, the second one won't.

Eventually, since the string builder array grows by doubling it's size, you won't have many allocations happening.

Manipulation is more than just concatenating. Imagine you want to insert a character in the middle of a String. How would you do that, since Strings are immutable? You would have to create a new String. With StringBuilder you can insert(int offset, c)

See StringBuilder javadoc

You have method like

delete(int start, int end)
// Removes the characters in a substring of this sequence.

replace(int start, int end, String str)
// Replaces the characters in a substring of this sequence with characters in the specified String.

reverse()
// Causes this character sequence to be replaced by the reverse of the sequence.

insert(int dstOffset, CharSequence s)
// Inserts the specified CharSequence into this sequence.

How is the usage of StringBuilder more efficient than normal String class for string manipulations?

It's vastly more efficient when you're performing many manipulations in a loop. Consider any string conversion or replacement function that needs to iterate over individual characters, such as this one to escape <, >, &, ", ' characters for XML or HTML:

public static String xmlEscape(String s) {
    StringBuilder sb = new StringBuilder(
        (int)Math.min(Integer.MAX_VALUE, s.length() * 5L / 4));
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c == '<') sb.append("&lt;");
        else if (c == '>') sb.append("&gt;");
        else if (c == '&') sb.append("&amp;");
        else if (c == '"') sb.append("&quot;");
        else if (c == '\'') sb.append("&#039;");
        else sb.append(c);
    }
    return sb.toString();
}

The StringBuilder array is initially sized with a capacity a bit bigger than the input string in order to accommodate the original text plus likely replacements. The output text accumulates in that pre-allocated buffer and it most likely won't need any additional memory allocation during the loop.

If the above function accumulated the output in a String instead of a StringBuilder, it would duplicate the entire output again every single time a single character was processed, degrading it to quadratic (i.e., awful!) performance.

To the second question:

Is the statement given in SCJP Guide is wrong?

To be blunt, yes. It's extremely misleading to say that there will be "abandoned String objects in the String pool". As far as I know, the terminology "String pool" refers only to the intern pool such as used by the String.intern() method. The only time Strings are automatically put in the intern pool is when the ClassLoader loads a class and loads String literal constants from the source code into memory.

Manipulating String objects at run time certainly does not put extra objects in the intern pool (unless you call .intern() on purpose).

What the SCJP Guide should say is:

String objects are immutable, so if you choose to do a lot of manipulations with String objects, you will end up with a lot of abandoned String objects in the heap.

Abandoned objects on the heap aren't the biggest problem though, because the garbage collector will swiftly eat them. The real reason to use StringBuilders when doing multiple manipulations is to avoid unnecessary copying of the characters in the first place. That makes a massive difference to the performance as shown in @jmiserez' benchmark.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top