Why are mutable structs “evil”?

https://stackoverflow.com/questions/441309

22-07-2019
|

Question

Following the discussions here on SO I already read several times the remark that mutable structs are “evil” (like in the answer to this question).

What's the actual problem with mutability and structs in C#?

Solution

Structs are value types which means they are copied when they are passed around.

So if you change a copy you are changing only that copy, not the original and not any other copies which might be around.

If your struct is immutable then all automatic copies resulting from being passed by value will be the same.

If you want to change it you have to consciously do it by creating a new instance of the struct with the modified data. (not a copy)

OTHER TIPS

Where to start ;-p

Eric Lippert's blog is always good for a quote:

This is yet another reason why mutable value types are evil. Try to always make value types immutable.

First, you tend to lose changes quite easily... for example, getting things out of a list:

Foo foo = list[0];
foo.Name = "abc";

what did that change? Nothing useful...

The same with properties:

myObj.SomeProperty.Size = 22; // the compiler spots this one

forcing you to do:

Bar bar = myObj.SomeProperty;
bar.Size = 22;
myObj.SomeProperty = bar;

less critically, there is a size issue; mutable objects tend to have multiple properties; yet if you have a struct with two ints, a string, a DateTime and a bool, you can very quickly burn through a lot of memory. With a class, multiple callers can share a reference to the same instance (references are small).

I wouldn't say evil but mutability is often a sign of overeagerness on the part of the programmer to provide a maximum of functionality. In reality, this is often not needed and that, in turn, makes the interface smaller, easier to use and harder to use wrong (= more robust).

One example of this is read/write and write/write conflicts in race conditions. These simply can't occur in immutable structures, since a write is not a valid operation.

Also, I claim that mutability is almost never actually needed, the programmer just thinks that it might be in the future. For example, it simply doesn't make sense to change a date. Rather, create a new date based off the old one. This is a cheap operation, so performance is not a consideration.

Mutable structs are not evil.

They are absolutely necessary in high performance circumstances. For example when cache lines and or garbage collection become a bottleneck.

I would not call the use of a immutable struct in these perfectly valid use-cases "evil".

I can see the point that C#'s syntax does not help to distinguish the access of a member of a value type or of a reference type, so I am all for preferring immutable structs, that enforce immutability, over mutable structs.

However, instead of simply labelling immutable structs as "evil", I would advise to embrace the language and advocate more helpful and constructive rule of thumbs.

For example: "structs are value types, which are copied by default. you need a reference if you don't want to copy them" or "try to work with readonly structs first".

Structs with public mutable fields or properties are not evil.

Struct methods (as distinct from property setters) which mutate "this" are somewhat evil, only because .net doesn't provide a means of distinguishing them from methods which do not. Struct methods that do not mutate "this" should be invokable even on read-only structs without any need for defensive copying. Methods which do mutate "this" should not be invokable at all on read-only structs. Since .net doesn't want to forbid struct methods that don't modify "this" from being invoked on read-only structs, but doesn't want to allow read-only structs to be mutated, it defensively copies structs in read-only contexts, arguably getting the worst of both worlds.

Despite the problems with the handling of self-mutating methods in read-only contexts, however, mutable structs often offer semantics far superior to mutable class types. Consider the following three method signatures:

struct PointyStruct {public int x,y,z;};
class PointyClass {public int x,y,z;};

void Method1(PointyStruct foo);
void Method2(ref PointyStruct foo);
void Method3(PointyClass foo);

For each method, answer the following questions:

Assuming the method doesn't use any "unsafe" code, might it modify foo?
If no outside references to 'foo' exist before the method is called, could an outside reference exist after?

_Answers:

Question 1:
Method1(): no (clear intent)
Method2(): yes (clear intent)
Method3(): yes (uncertain intent)
Question 2:
Method1(): no
Method2(): no (unless unsafe)
Method3(): yes

Method1 can't modify foo, and never gets a reference. Method2 gets a short-lived reference to foo, which it can use modify the fields of foo any number of times, in any order, until it returns, but it can't persist that reference. Before Method2 returns, unless it uses unsafe code, any and all copies that might have been made of its 'foo' reference will have disappeared. Method3, unlike Method2, gets a promiscuously-sharable reference to foo, and there's no telling what it might do with it. It might not change foo at all, it might change foo and then return, or it might give a reference to foo to another thread which might mutate it in some arbitrary way at some arbitrary future time. The only way to limit what Method3 might do to a mutable class object passed into it would be to encapsulate the mutable object into a read-only wrapper, which is ugly and cumbersome.

Arrays of structures offer wonderful semantics. Given RectArray[500] of type Rectangle, it's clear and obvious how to e.g. copy element 123 to element 456 and then some time later set the width of element 123 to 555, without disturbing element 456. "RectArray[432] = RectArray[321]; ...; RectArray[123].Width = 555;". Knowing that Rectangle is a struct with an integer field called Width will tell one all one needs to know about the above statements.

Now suppose RectClass was a class with the same fields as Rectangle and one wanted to do the same operations on a RectClassArray[500] of type RectClass. Perhaps the array is supposed to hold 500 pre-initialized immutable references to mutable RectClass objects. in that case, the proper code would be something like "RectClassArray[321].SetBounds(RectClassArray[456]); ...; RectClassArray[321].X = 555;". Perhaps the array is assumed to hold instances that aren't going to change, so the proper code would be more like "RectClassArray[321] = RectClassArray[456]; ...; RectClassArray[321] = New RectClass(RectClassArray[321]); RectClassArray[321].X = 555;" To know what one is supposed to do, one would have to know a lot more both about RectClass (e.g. does it support a copy constructor, a copy-from method, etc.) and the intended usage of the array. Nowhere near as clean as using a struct.

To be sure, there is unfortunately no nice way for any container class other than an array to offer the clean semantics of a struct array. The best one could do, if one wanted a collection to be indexed with e.g. a string, would probably be to offer a generic "ActOnItem" method which would accept a string for the index, a generic parameter, and a delegate which would be passed by reference both the generic parameter and the collection item. That would allow nearly the same semantics as struct arrays, but unless the vb.net and C# people can be pursuaded to offer a nice syntax, the code is going to be clunky-looking even if it is reasonably performance (passing a generic parameter would allow for use of a static delegate and would avoid any need to create any temporary class instances).

Personally, I'm peeved at the hatred Eric Lippert et al. spew regarding mutable value types. They offer much cleaner semantics than the promiscuous reference types that are used all over the place. Despite some of the limitations with .net's support for value types, there are many cases where mutable value types are a better fit than any other kind of entity.

Value types basically represents immutable concepts. Fx, it makes no sense to have a mathematical value such as an integer, vector etc. and then be able to modify it. That would be like redefining the meaning of a value. Instead of changing a value type, it makes more sense to assign another unique value. Think about the fact that value types are compared by comparing all the values of its properties. The point is that if the properties are the same then it is the same universal representation of that value.

As Konrad mentions it doesn't make sense to change a date either, as the value represents that unique point in time and not an instance of a time object which has any state or context-dependency.

Hopes this makes any sense to you. It is more about the concept you try to capture with value types than practical details, to be sure.

There are another couple corner cases that could lead to unpredictible behavior from programmers point of view. Here couple of them.

Immutable value types and readonly fields



// Simple mutable structure. 
// Method IncrementI mutates current state.
struct Mutable
{
    public Mutable(int i) : this() 
    {
        I = i;
    }

    public void IncrementI() { I++; }

    public int I {get; private set;}
}

// Simple class that contains Mutable structure
// as readonly field
class SomeClass 
{
    public readonly Mutable mutable = new Mutable(5);
}

// Simple class that contains Mutable structure
// as ordinary (non-readonly) field
class AnotherClass 
{
    public Mutable mutable = new Mutable(5);
}

class Program
{
    void Main()
    {
        // Case 1. Mutable readonly field
        var someClass = new SomeClass();
        someClass.mutable.IncrementI();
        // still 5, not 6, because SomeClass.mutable field is readonly
        // and compiler creates temporary copy every time when you trying to
        // access this field
        Console.WriteLine(someClass.mutable.I);

        // Case 2. Mutable ordinary field
        var anotherClass = new AnotherClass();
        anotherClass.mutable.IncrementI();

        //Prints 6, because AnotherClass.mutable field is not readonly
        Console.WriteLine(anotherClass.mutable.I);
    }
}

Mutable value types and array

Suppose have an array of our Mutable struct and we're calling IncrementI method for the first element of that array. What behavior are you expecting from this call? Should it change array's value or only a copy?

Mutable[] arrayOfMutables = new Mutable[1];
arrayOfMutables[0] = new Mutable(5);

// Now we actually accessing reference to the first element
// without making any additional copy
arrayOfMutables[0].IncrementI();

//Prints 6!!
Console.WriteLine(arrayOfMutables[0].I);

// Every array implements IList<T> interface
IList<Mutable> listOfMutables = arrayOfMutables;

// But accessing values through this interface lead
// to different behavior: IList indexer returns a copy
// instead of an managed reference
listOfMutables[0].IncrementI(); // Should change I to 7

// Nope! we still have 6, because previous line of code
// mutate a copy instead of a list value
Console.WriteLine(listOfMutables[0].I);

So, mutable structs are not evil as long as you and the rest of the team clearly understand what you are doing. But there are too many corner cases when program behavior would be different from expected one, that could lead to subtle hard to produce and hard to understand errors.

If you have ever programmed in a language like C/C++, structs are fine to use as mutable. Just pass them with ref, around and there is nothing that can go wrong. The only problem I find are the restrictions of the C# compiler and that, in some cases, I am unable to force the stupid thing to use a reference to the struct, instead of a Copy(like when a struct is part of a C# class).

So, mutable structs are not evil, C# has made them evil. I use mutable structs in C++ all the time and they are very convenient and intuitive. In contrast, C# has made me to completely abandon structs as members of classes because of the way they handle objects. Their convenience has cost us ours.

Imagine you have an array of 1,000,000 structs. Each struct representing an equity with stuff like bid_price, offer_price (perhaps decimals) and so on, this is created by C#/VB.

Imagine that array is created in a block of memory allocated in the unmanaged heap so that some other native code thread is able to concurrently access the array (perhaps some high-perf code doing math).

Imagine the C#/VB code is listening to a market feed of price changes, that code may have to access some element of the array (for whichever security) and then modify some price field(s).

Imagine this is being done tens or even hundreds of thousands of times per second.

Well lets face facts, in this case we really do want these structs to be mutable, they need to be because they are being shared by some other native code so creating copies isn't gonna help; they need to be because making a copy of some 120 byte struct at these rates is lunacy, especially when an update may actually impact just a byte or two.

Hugo

If you stick to what structs are intended for (in C#, Visual Basic 6, Pascal/Delphi, C++ struct type (or classes) when they are not used as pointers), you will find that a structure is not more than a compound variable. This means: you will treat them as a packed set of variables, under a common name (a record variable you reference members from).

I know that would confuse a lot of people deeply used to OOP, but that's not enough reason to say such things are inherently evil, if used correctly. Some structures are inmutable as they intend (this is the case of Python's namedtuple), but it is another paradigm to consider.

Yes: structs involve a lot of memory, but it will not be precisely more memory by doing:

point.x = point.x + 1

compared to:

point = Point(point.x + 1, point.y)

The memory consumption will be at least the same, or even more in the inmutable case (although that case would be temporary, for the current stack, depending on the language).

But, finally, structures are structures, not objects. In POO, the main property of an object is their identity, which most of the times is not more than its memory address. Struct stands for data structure (not a proper object, and so they don't have identity anyhow), and data can be modified. In other languages, record (instead of struct, as is the case for Pascal) is the word and holds the same purpose: just a data record variable, intended to be read from files, modified, and dumped into files (that is the main use and, in many languages, you can even define data alignment in the record, while that's not necessarily the case for properly called Objects).

Want a good example? Structs are used to read files easily. Python has this library because, since it is object-oriented and has no support for structs, it had to implement it in another way, which is somewhat ugly. Languages implementing structs have that feature... built-in. Try reading a bitmap header with an appropriate struct in languages like Pascal or C. It will be easy (if the struct is properly built and aligned; in Pascal you would not use a record-based access but functions to read arbitrary binary data). So, for files and direct (local) memory access, structs are better than objects. As for today, we're used to JSON and XML, and so we forget the use of binary files (and as a side effect, the use of structs). But yes: they exist, and have a purpose.

They are not evil. Just use them for the right purpose.

If you think in terms of hammers, you will want to treat screws as nails, to find screws are harder to plunge in the wall, and it will be screws' fault, and they will be the evil ones.

When something can be mutated, it gains a sense of identity.

struct Person {
    public string name; // mutable
    public Point position = new Point(0, 0); // mutable

    public Person(string name, Point position) { ... }
}

Person eric = new Person("Eric Lippert", new Point(4, 2));

Because Person is mutable, it's more natural to think about changing Eric's position than cloning Eric, moving the clone, and destroying the original. Both operations would succeed in changing the contents of eric.position, but one is more intuitive than the other. Likewise, it's more intuitive to pass Eric around (as a reference) for methods to modify him. Giving a method a clone of Eric is almost always going to be surprising. Anyone wanting to mutate Person must remember to ask for a reference to Person or they'll be doing the wrong thing.

If you make the type immutable, the problem goes away; if I can't modify eric, it makes no difference to me whether I receive eric or a clone of eric. More generally, a type is safe to pass by value if all of its observable state is held in members that are either:

immutable
reference types
safe to pass by value

If those conditions are met then a mutable value type behaves like a reference type because a shallow copy will still allow the receiver to modify the original data.

The intuitiveness of an immutable Person depends on what you're trying to do though. If Person just represents a set of data about a person, there's nothing unintuitive about it; Person variables truly represent abstract values, not objects. (In that case, it'd probably be more appropriate to rename it to PersonData.) If Person is actually modeling a person itself, the idea of constantly creating and moving clones is silly even if you've avoided the pitfall of thinking you're modifying the original. In that case it'd probably be more natural to simply make Person a reference type (that is, a class.)

Granted, as functional programming has taught us there are benefits to making everything immutable (no one can secretly hold on to a reference to eric and mutate him), but since that's not idiomatic in OOP it's still going to be unintuitive to anyone else working with your code.

It doesn’t have anything to do with structs (and not with C#, either) but in Java you might get problems with mutable objects when they are e.g. keys in a hash map. If you change them after adding them to a map and it changes its hash code, evil things might happen.

Personally when I look at code the following looks pretty clunky to me:

data.value.set ( data.value.get () + 1 ) ;

rather than simply

data.value++ ; or data.value = data.value + 1 ;

Data encapsulation is useful when passing a class around and you want to ensure the value is modified in a controlled fashion. However when you have public set and get functions that do little more than set the value to what ever is passed in, how is this an improvement over simply passing a public data structure around?

When I create a private structure inside a class, I created that structure to organize a set of variables into one group. I want to be able to modify that structure within the class scope, not get copies of that structure and create new instances.

To me this prevents a valid use of structures being used to organize public variables, if I wanted access control I'd use a class.

There are several issues with Mr. Eric Lippert's example. It is contrived to illustrate the point that structs are copied and how that could be a problem if you are not careful. Looking at the example I see it as a result of a bad programming habit and not really a problem with either struct or the class.

A struct is supposed to have only public members and should not require any encapsulation. If it does then it really should be a type/class. You really do not need two constructs to say the same thing.
If you have class enclosing a struct, you would call a method in the class to mutate the member struct. This is what I would do as a good programming habit.

A proper implementation would be as follows.

struct Mutable {
public int x;
}

class Test {
    private Mutable m = new Mutable();
    public int mutate()
    { 
        m.x = m.x + 1;
        return m.x;
    }
  }
  static void Main(string[] args) {
        Test t = new Test();
        System.Console.WriteLine(t.mutate());
        System.Console.WriteLine(t.mutate());
        System.Console.WriteLine(t.mutate());
    }

It looks like it is an issue with programming habit as opposed to an issue with struct itself. Structs are supposed to be mutable, that is the idea and intent.

The result of the changes voila behaves as expected:

1 2 3 Press any key to continue . . .

There are many advantages and disadvantages to mutable data. The million-dollar disadvantage is aliasing. If the same value is being used in multiple places, and one of them changes it, then it will appear to have magically changed to the other places that are using it. This is related to, but not identical with, race conditions.

The million-dollar advantage is modularity, sometimes. Mutable state can allow you to hide changing information from code that doesn't need to know about it.

The Art of the Interpreter goes into these trade offs in some detail, and gives some examples.

I don't believe they're evil if used correctly. I wouldn't put it in my production code, but I would for something like structured unit testing mocks, where the lifespan of a struct is relatively small.

Using the Eric example, perhaps you want to create a second instance of that Eric, but make adjustments, as that's the nature of your test (ie duplication, then modifying). It doesn't matter what happens with the first instance of Eric if we're just using Eric2 for the remainder of the test script, unless you're planning on using him as a test comparison.

This would be mostly useful for testing or modifying legacy code that shallow defines a particular object (the point of structs), but by having an immutable struct, this prevents it's usage annoyingly.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow