Mutable vs immutable objects

https://stackoverflow.com/questions/214714

03-07-2019
|

Question

I'm trying to get my head around mutable vs immutable objects. Using mutable objects gets a lot of bad press (e.g. returning an array of strings from a method) but I'm having trouble understanding what the negative impacts are of this. What are the best practices around using mutable objects? Should you avoid them whenever possible?

Solution

Well, there are a couple aspects to this. Number one, mutable objects without reference-identity can cause bugs at odd times. For example, consider a Person bean with an value-based equals method:

Map<Person, String> map = ...
Person p = new Person();
map.put(p, "Hey, there!");

p.setName("Daniel");
map.get(p);       // => null

The Person instance gets "lost" in the map when used as a key because it's hashCode and equality were based upon mutable values. Those values changed outside the map and all of the hashing became obsolete. Theorists like to harp on this point, but in practice I haven't found it to be too much of an issue.

Another aspect is the logical "reasonability" of your code. This is a hard term to define, encompassing everything from readability to flow. Generically, you should be able to look at a piece of code and easily understand what it does. But more important than that, you should be able to convince yourself that it does what it does correctly. When objects can change independently across different code "domains", it sometimes becomes difficult to keep track of what is where and why ("spooky action at a distance"). This is a more difficult concept to exemplify, but it's something that is often faced in larger, more complex architectures.

Finally, mutable objects are killer in concurrent situations. Whenever you access a mutable object from separate threads, you have to deal with locking. This reduces throughput and makes your code dramatically more difficult to maintain. A sufficiently complicated system blows this problem so far out of proportion that it becomes nearly impossible to maintain (even for concurrency experts).

Immutable objects (and more particularly, immutable collections) avoid all of these problems. Once you get your mind around how they work, your code will develop into something which is easier to read, easier to maintain and less likely to fail in odd and unpredictable ways. Immutable objects are even easier to test, due not only to their easy mockability, but also the code patterns they tend to enforce. In short, they're good practice all around!

With that said, I'm hardly a zealot in this matter. Some problems just don't model nicely when everything is immutable. But I do think that you should try to push as much of your code in that direction as possible, assuming of course that you're using a language which makes this a tenable opinion (C/C++ makes this very difficult, as does Java). In short: the advantages depend somewhat on your problem, but I would tend to prefer immutability.

OTHER TIPS

Immutable Objects vs. Immutable Collections

One of the finer points in the debate over mutable vs. immutable objects is the possibility of extending the concept of immutability to collections. An immutable object is an object that often represents a single logical structure of data (for example an immutable string). When you have a reference to an immutable object, the contents of the object will not change.

An immutable collection is a collection that never changes.

When I perform an operation on a mutable collection, then I change the collection in place, and all entities that have references to the collection will see the change.

When I perform an operation on an immutable collection, a reference is returned to a new collection reflecting the change. All entities that have references to previous versions of the collection will not see the change.

Clever implementations do not necessarily need to copy (clone) the entire collection in order to provide that immutability. The simplest example is the stack implemented as a singly linked list and the push/pop operations. You can reuse all of the nodes from the previous collection in the new collection, adding only a single node for the push, and cloning no nodes for the pop. The push_tail operation on a singly linked list, on the other hand, is not so simple or efficient.

Immutable vs. Mutable variables/references

Some functional languages take the concept of immutability to object references themselves, allowing only a single reference assignment.

In Erlang this is true for all "variables". I can only assign objects to a reference once. If I were to operate on a collection, I would not be able to reassign the new collection to the old reference (variable name).
Scala also builds this into the language with all references being declare with var or val, vals only being single assignment and promoting a functional style, but vars allowing a more c-like or java-like program structure.
The var/val declaration is required, while many traditional languages use optional modifiers such as final in java and const in c.

Ease of Development vs. Performance

Almost always the reason to use an immutable object is to promote side effect free programming and simple reasoning about the code (especially in a highly concurrent/parallel environment). You don't have to worry about the underlying data being changed by another entity if the object is immutable.

The main drawback is performance. Here is a write-up on a simple test I did in Java comparing some immutable vs. mutable objects in a toy problem.

The performance issues are moot in many applications, but not all, which is why many large numerical packages, such as the Numpy Array class in Python, allow for In-Place updates of large arrays. This would be important for application areas that make use of large matrix and vector operations. These large data-parallel and computationally intensive problems achieve a great speed-up by operating in place.

Check this blog post: http://www.yegor256.com/2014/06/09/objects-should-be-immutable.html. It explains why immutable objects are better than mutable. In short:

immutable objects are simpler to construct, test, and use
truly immutable objects are always thread-safe
they help to avoid temporal coupling
their usage is side-effect free (no defensive copies)
identity mutability problem is avoided
they always have failure atomicity
they are much easier to cache

Immutable objects are a very powerful concept. They take away a lot of the burden of trying to keep objects/variables consistent for all clients.

You can use them for low level, non-polymorphic objects - like a CPoint class - that are used mostly with value semantics.

Or you can use them for high level, polymorphic interfaces - like an IFunction representing a mathematical function - that is used exclusively with object semantics.

Greatest advantage : immutability + object semantics + smart pointers make object ownership a non-issue, all clients of the the object have their own private copy by default. Implicitly this also means deterministic behavior in the presence of concurrency.

Disadvantage : when used with objects containing lots of data, memory consumption can become an issue. A solution to this could be to keep operations on an object symbolic, and do lazy evaluation. However, this can then lead to chains of symbolic calculations, that may negatively influence performance, if the interface is not designed to accomodate symbolic operations. Something to definitely avoid in this case is returning huge chunks of memory from a method. In combination with chained symbolic operations this could lead to massive memory consumption and performance degradation.

So immutable objects are definitely my primary way of thinking about object oriented design, but they are not a dogma. They solve a lot of problems for clients of objects, but also create many, especially for the implementers.

You should specify what language you're talking about. For low-level languages like C or C++, I prefer to use mutable objects to conserve space and reduce memory churn. In higher-level languages, immutable objects make it easier to reason about the behavior of the code (especially multi-threaded code) because there's no "spooky action at a distance".

A mutable object is simply an object that can be modified after it's created/instantiated, vs an immutable object that cannot be modified (see the Wikipedia page on the subject). An example of this in a programming language is Pythons lists and tuples. Lists can be modified (e.g., new items can be added after it's created) whereas tuples cannot.

I don't really think there's a clearcut answer as to which one is better for all situations. They both have their places.

If a class type is mutable, a variable of that class type can have a number of different meanings. For example, suppose an object foo has a field int[] arr, and it holds a reference to a int[3] holding the numbers {5, 7, 9}. Even though the type of the field is known, there are at least four different things it can represent:

A potentially-shared reference, all of whose holders care only that it encapsulates the values 5, 7, and 9. If foo wants arr to encapsulate different values, it must replace it with a different array that contains the desired values. If one wants to make a copy of foo, one may give the copy either a reference to arr or a new array holding the values {1,2,3}, whichever is more convenient.
The only reference, anywhere in the universe, to an array which encapsulates the values 5, 7, and 9. set of three storage locations which at the moment hold the values 5, 7, and 9; if foo wants it to encapsulate the values 5, 8, and 9, it may either change the second item in that array or create a new array holding the values 5, 8, and 9 and abandon the old one. Note that if one wanted to make a copy of foo, one must in the copy replace arr with a reference to a new array in order for foo.arr to remain as the only reference to that array anywhere in the universe.
A reference to an array which is owned by some other object that has exposed it to foo for some reason (e.g. perhaps it wants foo to store some data there). In this scenario, arr doesn't encapsulate the contents of the array, but rather its identity. Because replacing arr with a reference to a new array would totally change its meaning, a copy of foo should hold a reference to the same array.
A reference to an array of which foo is the sole owner, but to which references are held by other object for some reason (e.g. it wants to have the other object to store data there--the flipside of the previous case). In this scenario, arr encapsulates both the identity of the array and its contents. Replacing arr with a reference to a new array would totally change its meaning, but having a clone's arr refer to foo.arr would violate the assumption that foo is the sole owner. There is thus no way to copy foo.

In theory, int[] should be a nice simple well-defined type, but it has four very different meanings. By contrast, a reference to an immutable object (e.g. String) generally only has one meaning. Much of the "power" of immutable objects stems from that fact.

If you return references of an array or string, then outside world can modify the content in that object, and hence make it as mutable (modifiable) object.

Immutable means can't be changed, and mutable means you can change.

Objects are different than primitives in Java. Primitives are built in types (boolean, int, etc) and objects (classes) are user created types.

Primitives and objects can be mutable or immutable when defined as member variables within the implementation of a class.

A lot of people people think primitives and object variables having a final modifier infront of them are immutable, however, this isn't exactly true. So final almost doesn't mean immutable for variables. See example here
http://www.siteconsortium.com/h/D0000F.php.

Mutable instances is passed by reference.

Immutable instances is passed by value.

Abstract example. Suppose that exist a file named txtfile in my HDD. Now, when you ask txtfile from me, I can return it in two modes:

Create a shortcut to txtfile and pas shortcut to you, or
Take a copy for txtfile and pas copy to you.

In first mode, returned txtfile is a mutable file, because when you do changes in shortcut file, you do changes in original file too. Advantage of this mode is that each returned shortcut required less memory (on RAM or in HDD) and disadvantage is that everyone (not only me, owner) have permissions to modify file content.

In second mode, returned txtfile is an immutable file, because all changes in received file does not refer to the original file. Advantage of this mode is that only me (owner) can modify original file and disadvantage is that each returned copy required memory (in RAM or in HDD).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow