Question

Does string immutability work by statement, or by strings within a statement?

For example, I understand that the following code will allocate two strings on the heap.

string s = "hello ";
s += "world!";

"hello" will remain on the heap until garbage collected; and s now references "hello world!" on the heap. However, how many strings does the following line allocate on the heap...1 or 2? Also, is there a tool/way to verify the results?

string s = "goodbye " + "cruel world!";
Was it helpful?

Solution

The compiler has special treatment for string concatenation, which is why the second example is only ever one string. And "interning" means that even if you run this line 20000 times there is still only 1 string.

Re testing the results... the easiest way (in this case) is probably to look in reflector:

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 1
    .locals init (
        [0] string s)
    L_0000: ldstr "goodbye cruel world!"
    L_0005: stloc.0 
    L_0006: ldloc.0 
    L_0007: call void [mscorlib]System.Console::WriteLine(string)
    L_000c: ret 
}

As you can see (ldstr), the compiler has done this for you already.

OTHER TIPS

Literal strings are interned this means that "hello " does not reside on the heap but in the data segment [see comment] of the progam (and is thus not eligible for garbage collection), same goes for "world", as for "hello world" that may be also interned, if the compiler is smart enough.

"goodbye cruel world" will be interned since string literal concatenation is something treated by the compiler.


Edit: I'm not sure about the data segment statement, please see this question for more information.

Actually, probably 3. a const string for "goodbye", a const string for "cruel world", and then a new string for the result.

You can find out for sure by looking at the generated code. It depends on the compiler, (and, in fact, on the language, this isn't obvious) but you can read the output of g++ by using the -a flag (I think, check the man page) to get the intermediate code.

Don't trust what you "Know" about strings. You might look through the source code for the implementation of string. For instance your example:

string s = "goodbye " + "cruel world!";

In java would allocate a single string. Java plays some pretty cute tricks and would be hard to outsmart--just never optimize until you need to!

Currently however, as far as I know, using this:

String s="";
for(int i=0;i<1000;i++)
    s+=" ";

to create a 1000 space string still tends to be extremely inefficient

Appending in a loop is pretty bad, but otherwise it's probably as efficient as StringBuilder.

Be careful here, because the compiler can make some very different optimizations when the string values are known at compile time. If the strings you're using aren't known until runtime (pulled from a config file, database, or user input) you'll see some very different IL.

If you're just going to do one or two string concatenations I wouldn't worry about it.

However if you have lot of concatenations, or you have a loop, then you definitely want to take precautions. In the Java world that means you use StringBuffer insteads of concatenating string.

If it's not just in one line,the concatenation of two strings may be accomplished by making the first string into a StringBuffer, doing the concatenation, and returning the result string.

Creating the StringBuffer yourself may seem like overkill, but that's what is going to happen anyway.-

By all means don't prematurely optimise, but don't discount how badly performant string concatonations can be. It's not the object creation, but the GC work that it causes.

There is a lab on (ASP.NET Escalation Engineer) Tess Ferrnandez's blog that show's a (rather extreme, granted) example of how string concatonation can bring a server to its knees.

If the compiler is "intelligent", it will only be one string with "goodbye cruel world!"

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top