Question

Why is it that they decided to make string immutable in Java and .NET (and some other languages)? Why didn't they make it mutable?

Was it helpful?

Solution

According to Effective Java, chapter 4, page 73, 2nd edition:

"There are many good reasons for this: Immutable classes are easier to design, implement, and use than mutable classes. They are less prone to error and are more secure.

[...]

"Immutable objects are simple. An immutable object can be in exactly one state, the state in which it was created. If you make sure that all constructors establish class invariants, then it is guaranteed that these invariants will remain true for all time, with no effort on your part.

[...]

Immutable objects are inherently thread-safe; they require no synchronization. They cannot be corrupted by multiple threads accessing them concurrently. This is far and away the easiest approach to achieving thread safety. In fact, no thread can ever observe any effect of another thread on an immutable object. Therefore, immutable objects can be shared freely

[...]

Other small points from the same chapter:

Not only can you share immutable objects, but you can share their internals.

[...]

Immutable objects make great building blocks for other objects, whether mutable or immutable.

[...]

The only real disadvantage of immutable classes is that they require a separate object for each distinct value.

OTHER TIPS

There are at least two reasons.

First - security http://www.javafaq.nu/java-article1060.html

The main reason why String made immutable was security. Look at this example: We have a file open method with login check. We pass a String to this method to process authentication which is necessary before the call will be passed to OS. If String was mutable it was possible somehow to modify its content after the authentication check before OS gets request from program then it is possible to request any file. So if you have a right to open text file in user directory but then on the fly when somehow you manage to change the file name you can request to open "passwd" file or any other. Then a file can be modified and it will be possible to login directly to OS.

Second - Memory efficiency http://hikrish.blogspot.com/2006/07/why-string-class-is-immutable.html

JVM internally maintains the "String Pool". To achive the memory efficiency, JVM will refer the String object from pool. It will not create the new String objects. So, whenever you create a new string literal, JVM will check in the pool whether it already exists or not. If already present in the pool, just give the reference to the same object or create the new object in the pool. There will be many references point to the same String objects, if someone changes the value, it will affect all the references. So, sun decided to make it immutable.

Actually, the reasons string are immutable in java doesn't have much to do with security. The two main reasons are the following:

Thead Safety:

Strings are extremely widely used type of object. It is therefore more or less guaranteed to be used in a multi-threaded environment. Strings are immutable to make sure that it is safe to share strings among threads. Having an immutable strings ensures that when passing strings from thread A to another thread B, thread B cannot unexpectedly modify thread A's string.

Not only does this help simplify the already pretty complicated task of multi-threaded programming, but it also helps with performance of multi-threaded applications. Access to mutable objects must somehow be synchronized when they can be accessed from multiple threads, to make sure that one thread doesn't attempt to read the value of your object while it is being modified by another thread. Proper synchronization is both hard to do correctly for the programmer, and expensive at runtime. Immutable objects cannot be modified and therefore do not need synchronization.

Performance:

While String interning has been mentioned, it only represents a small gain in memory efficiency for Java programs. Only string literals are interned. This means that only the strings which are the same in your source code will share the same String Object. If your program dynamically creates string that are the same, they will be represented in different objects.

More importantly, immutable strings allow them to share their internal data. For many string operations, this means that the underlying array of characters does not need to be copied. For example, say you want to take the five first characters of String. In Java, you would calls myString.substring(0,5). In this case, what the substring() method does is simply to create a new String object that shares myString's underlying char[] but who knows that it starts at index 0 and ends at index 5 of that char[]. To put this in graphical form, you would end up with the following:

 |               myString                  |
 v                                         v
"The quick brown fox jumps over the lazy dog"   <-- shared char[]
 ^   ^
 |   |  myString.substring(0,5)

This makes this kind of operations extremely cheap, and O(1) since the operation neither depends on the length of the original string, nor on the length of the substring we need to extract. This behavior also has some memory benefits, since many strings can share their underlying char[].

Thread safety and performance. If a string cannot be modified it is safe and quick to pass a reference around among multiple threads. If strings were mutable, you would always have to copy all of the bytes of the string to a new instance, or provide synchronization. A typical application will read a string 100 times for every time that string needs to be modified. See wikipedia on immutability.

One should really ask, "why should X be mutable?" It's better to default to immutability, because of the benefits already mentioned by Princess Fluff. It should be an exception that something is mutable.

Unfortunately most of the current programming languages default to mutability, but hopefully in the future the default is more on immutablity (see A Wish List for the Next Mainstream Programming Language).

One factor is that, if strings were mutable, objects storing strings would have to be careful to store copies, lest their internal data change without notice. Given that strings are a fairly primitive type like numbers, it is nice when one can treat them as if they were passed by value, even if they are passed by reference (which also helps to save on memory).

Wow! I Can't believe the misinformation here. Strings being immutable have nothing with security. If someone already has access to the objects in a running application (which would have to be assumed if you are trying to guard against someone 'hacking' a String in your app), they would certainly be a plenty of other opportunities available for hacking.

It's a quite novel idea that the immutability of String is addressing threading issues. Hmmm ... I have an object that is being changed by two different threads. How do I resolve this? synchronize access to the object? Naawww ... let's not let anyone change the object at all -- that'll fix all of our messy concurrency issues! In fact, let's make all objects immutable, and then we can removed the synchonized contruct from the Java language.

The real reason (pointed out by others above) is memory optimization. It is quite common in any application for the same string literal to be used repeatedly. It is so common, in fact, that decades ago, many compilers made the optimization of storing only a single instance of a string literal. The drawback of this optimization is that runtime code that modifies a string literal introduces a problem because it is modifying the instance for all other code that shares it. For example, it would be not good for a function somewhere in an application to change the string literal "dog" to "cat". A printf("dog") would result in "cat" being written to stdout. For that reason, there needed to be a way of guarding against code that attempts to change string literals (i. e., make them immutable). Some compilers (with support from the OS) would accomplish this by placing string literal into a special readonly memory segment that would cause a memory fault if a write attempt was made.

In Java this is known as interning. The Java compiler here is just following an standard memory optimization done by compilers for decades. And to address the same issue of these string literals being modified at runtime, Java simply makes the String class immutable (i. e, gives you no setters that would allow you to change the String content). Strings would not have to be immutable if interning of string literals did not occur.

String is not a primitive type, yet you normally want to use it with value semantics, i.e. like a value.

A value is something you can trust won't change behind your back. If you write: String str = someExpr(); You don't want it to change unless YOU do something with str.

String as an Object has naturally pointer semantics, to get value semantics as well it needs to be immutable.

I know this is a bump, but... Are they really immutable? Consider the following.

public static unsafe void MutableReplaceIndex(string s, char c, int i)
{
    fixed (char* ptr = s)
    {
        *((char*)(ptr + i)) = c;
    }
}

...

string s = "abc";
MutableReplaceIndex(s, '1', 0);
MutableReplaceIndex(s, '2', 1);
MutableReplaceIndex(s, '3', 2);
Console.WriteLine(s); // Prints 1 2 3

You could even make it an extension method.

public static class Extensions
{
    public static unsafe void MutableReplaceIndex(this string s, char c, int i)
    {
        fixed (char* ptr = s)
        {
            *((char*)(ptr + i)) = c;
        }
    }
}

Which makes the following work

s.MutableReplaceIndex('1', 0);
s.MutableReplaceIndex('2', 1);
s.MutableReplaceIndex('3', 2);

Conclusion: They're in an immutable state which is known by the compiler. Of couse the above only applies to .NET strings as Java doesn't have pointers. However a string can be entirely mutable using pointers in C#. It's not how pointers are intended to be used, has practical usage or is safely used; it's however possible, thus bending the whole "mutable" rule. You can normally not modify an index directly of a string and this is the only way. There is a way that this could be prevented by disallowing pointer instances of strings or making a copy when a string is pointed to, but neither is done, which makes strings in C# not entirely immutable.

For most purposes, a "string" is (used/treated as/thought of/assumed to be) a meaningful atomic unit, just like a number.

Asking why the individual characters of a string are not mutable is therefore like asking why the individual bits of an integer are not mutable.

You should know why. Just think about it.

I hate to say it, but unfortunately we're debating this because our language sucks, and we're trying to using a single word, string, to describe a complex, contextually situated concept or class of object.

We perform calculations and comparisons with "strings" similar to how we do with numbers. If strings (or integers) were mutable, we'd have to write special code to lock their values into immutable local forms in order to perform any kind of calculation reliably. Therefore, it is best to think of a string like a numeric identifier, but instead of being 16, 32, or 64 bits long, it could be hundreds of bits long.

When someone says "string", we all think of different things. Those who think of it simply as a set of characters, with no particular purpose in mind, will of course be appalled that someone just decided that they should not be able to manipulate those characters. But the "string" class isn't just an array of characters. It's a STRING, not a char[]. There are some basic assumptions about the concept we refer to as a "string", and it generally can be described as meaningful, atomic unit of coded data like a number. When people talk about "manipulating strings", perhaps they're really talking about manipulating characters to build strings, and a StringBuilder is great for that. Just think a bit about what the word "string" truly means.

Consider for a moment what it would be like if strings were mutable. The following API function could be tricked into returning information for a different user if the mutable username string is intentionally or unintentionally modified by another thread while this function is using it:

string GetPersonalInfo( string username, string password )
{
    string stored_password = DBQuery.GetPasswordFor( username );
    if (password == stored_password)
    {
        //another thread modifies the mutable 'username' string
        return DBQuery.GetPersonalInfoFor( username );
    }
}

Security isn't just about 'access control', it's also about 'safety' and 'guaranteeing correctness'. If a method can't be easily written and depended upon to perform a simple calculation or comparison reliably, then it's not safe to call it, but it would be safe to call into question the programming language itself.

Immutability is not so closely tied to security. For that, at least in .NET, you get the SecureString class.

It's a trade off. Strings go into the string pool and when you create multiple identical strings they share the same memory. The designers figured this memory saving technique would work well for the common case, since programs tend to grind over the same strings a lot.

The downside is that concatenations make a lot of extra strings that are only transitional and just become garbage, actually harming memory performance. You have StringBuffer and StringBuilder (in Java, StringBuilder is also in .NET) to use to preserve memory in these cases.

The decision to have string mutable in C++ causes a lot of problems, see this excellent article by Kelvin Henney about Mad COW Disease.

COW = Copy On Write.

Strings in Java are not truly immutable, you can change their value's using reflection and or class loading. You should not be depending on that property for security. For examples see: Magic Trick In Java

Immutability is good. See Effective Java. If you had to copy a String every time you passed it around, then that would be a lot of error-prone code. You also have confusion as to which modifications affect which references. In the same way that Integer has to be immutable to behave like int, Strings have to behave as immutable to act like primitives. In C++ passing strings by value does this without explicit mention in the source code.

There is an exception for nearly almost every rule:

using System;
using System.Runtime.InteropServices;

namespace Guess
{
    class Program
    {
        static void Main(string[] args)
        {
            const string str = "ABC";

            Console.WriteLine(str);
            Console.WriteLine(str.GetHashCode());

            var handle = GCHandle.Alloc(str, GCHandleType.Pinned);

            try
            {
                Marshal.WriteInt16(handle.AddrOfPinnedObject(), 4, 'Z');

                Console.WriteLine(str);
                Console.WriteLine(str.GetHashCode());
            }
            finally
            {
                handle.Free();
            }
        }
    }
}

It's largely for security reasons. It's much harder to secure a system if you can't trust that your strings are tamperproof.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top