Value/reference type, object and semantics

https://softwareengineering.stackexchange.com/questions/408540

09-03-2021
|

Domanda

What are the relationships between these pairs of concepts:

For the first pair of concepts, it seems to me that an object of value type is an element (data or procedure), and an object of reference type is the location (absolute or relative) of an element.

Questions:

Is an object of value type a value object?
Is an object of reference type a reference object?
Does an object of value type have value semantics?
Does an object of reference type reference semantics?

Soluzione

TL;DR

Note. — The meaning of "value semantics" in the context of programming has drifted, as evidenced by the definitions provided in the appendix. What follows is my attempt to make sense of it all.

Value semantics is instance independence.
Value-semantic types are types with value semantics.
Reference semantics is instance interdependence.
Reference-semantic types are types with reference semantics.
Value type variables hold instances.
Reference type variables hold references to instances.
Objects are instances of classes.
Value objects have value based equality.
Value objects should have value semantics.
Value objects can be instances of either value types or reference types.
Reference objects have identity based equality.
Reference objects can be instances of either value types or reference types.
Entities are reference objects that have value.
Services are reference objects that do not have value.
Value semantics can be archived by:
- Complete copy. Which can be archived by:
  - Shallow copy of value types without reference type fields.
  - Deep copy.
- Deep copy of mutable fields and shallow copy of immutable fields.
- Immutable types.
- Copy-on-write implementation.
Reference semantics is archived by… not having value semantics.

Spaces in Memory

Information is stored in spaces in memory where they can be reused. There three spaces in memory:

Stack (e.g. arguments, local variables).
Heap (e.g. globals, static fields).
Relative (e.g. array elements, instance fields).

Of each kind, there can be multiple spaces in memory. For example, multiple arguments. Each a space in memory.

A language/runtime/platform may or may not have any of these. For example, some do not have a stack. Some do not have arrays or composite types. And some do not have a heap. However, they will all have at least heap or stack.

We will not be talking about named constants, literals, immediate values, or the distinction between l-values and r-values.

Variables

In most languages we give names to spaces in memory. This makes it easier to use them. We call these named spaces in memory “variables”.

Going forward, we will refer to the information stored in the space in memory named by a variable as the contents of the variable.

It is also worth noting that the names of the variables may or may not exist in runtime (e.g. reflection), also if they do, their static type information may or may not exist in runtime (e.g. type erasure).

Furthermore, the position in memory of named variable may change.

Note. — What I call here contents, other authors call value. I'm not calling it value, because I'm using Lakos' definition of value. However, I would agree that the contents of a variable is a value. A physical value. While the value that Lakos' talks about is a ~~platonic~~ logic value.

Types and Instances

A type is a set of memory layout. We will refer to each of each of the possible memory layout of a given type that actually exist in memory as instances. Instances may overlap in memory.

These memory layouts will define the contents of the variable that hold said instances. See “Value Types and Reference Types” below.

Variables and Types

In a dynamically typed language, the contents of the variables can be of any type.

On the other hand, in statically typed languages, the variables have a type, and this type specifies the possible contents of the variable.

Note. — Some statically typed language support typing a variable as dynamic. That is, the type of the variable is “look into the contents of the variable to figure out the type”.

Primitive Types and Composite Types

Composite types are types constructed out of other types. Which is not true for primitive types.

Do not confuse primitive types with build-in types. That is the set of types provided by a languages. As currently plenty of languages provide composite types. Instead primitive types are indivisible within the constraints of the language.

Equality

Considering the instances of a type, we may or may not care about a concept of equality for these instances. That is, equality may or may not be part of the specification/requirements for the type.

We only care about equality, when the type has a concept of “value”.

Values

For types that have a concept of value, the value is derived from the contents of the instances. Or rather, I should say, that the contents represent the value.

However, the contents is not the value. I mean, the equality of the instances does not imply equal representation in memory. This is because there could be multiple representations in memory for the same value. Consider, for example, that in some types there are multiple ways to represent a value in memory, and thus would require canonization/normalization (e.g. strings, date, decimal floating point numbers).

This is also how we can say that values stored in different types have the same value, i.e. are equal (e.g. 5 stored in a short integer vs 5 stored in a long integer).

When dealing with composite types, we would talk about salient attributes.

From the book Large-Scale C++ Volume I: Process and Architecture by John S. Lakos:

A salient attribute of a value-semantic type is one of its (typically observable) attributes that contributes to the overall value of the object itself.

Will get to “value-semantic type”.

Only salient attributes are considered part of the value of a type, and which attributes are salient is decided by the specification/requirements for that type, not by the representation in memory.

References

References are variables such that their contents refers to an instance, instead of being an instance. That is, the contents will have a position in memory where an instance is found, instead of containing the instance directly.

What I define above would be pointers in C++. We are not talking about the C++ distinction of pointers and references.

In some platforms there is a garbage collector that may move instances around. When this happens, the garbage collector also has to update the references to them.

Due to composition, we may have instances that have references.

Copy and Move

Since each variable has a space in memory, when we assign a variable to another (assuming their types are compatible) we need to copy the contents. See “Types of Copy” below.

If the types of the variables are not compatible. A conversion is necessary. One special case is when assigning to a reference.

In some cases, we know that a variable will cease to exist. For example, a local variable when returning from a subroutine goes out of scope. If we are returning the local variable, and assigning the returned value to another variable, the compiler may opt to not copy it, but move it instead. Moving here means changing the space in memory named by the variable.

Since move happens only when a variable is ceasing to exist. We do not have to worry about move.

Pass by Reference and Pass by Value

A parameter of a subroutine is a variable. When we call the subroutine, the parameters are assigned. If the parameters are types are references, then we are passing instances by references. Otherwise, they are passing by value. And yes, that is a copy.

Types of Copy

A shallow copy limits itself to copying the contents of a variable. On the other hand, a deep copy would follow references and copy them too. That is, a deep copy is recursive with respect to references.

Please note that these are not the only options when it comes to copy instances. We will come back to that.

For contents that do not include references, a shallow copy is a complete copy. However, for contents that include references, a deep copy is necessary to get a complete copy.

We will understand as complete copy, a copy of the whole memory layout of an instance. If we do not copy the whole, then it is an incomplete copy. If the memory layout does not have references and exists only in the contents of the variable, then a shallow copy is a complete copy. Otherwise, a shallow copy is an incomplete copy.

A shallow copy is the default.

Note. — A variable contents could be a handle to a resource. It could be an external resources, such as a a handle to a window object or a key to a row in a database table. It could also be an internal resource such as an index to an array (See Entity-Component-System). These are not references as defined above, however they can be considered as such (we could say a pointer is a physical reference, while a handle is a logical reference). If the referenced resources are not copied, they may provide a means for instances to affect each other. See "Rule of Three" below. You may also be interested in RAII. My personal opinion we should not try to archive value semantics with that include handles to external resources, and if we were to, it would require to copy those resources too.

Value Types and Reference Types

We find in the C# language reference:

A variable of a value type contains an instance of the type. This differs from a variable of a reference type, which contains a reference to an instance of the type.

Reference types are types such that variables of that type are references to the instance. That memory layout for reference types defines that the variables hold a reference to the instance.

In C++, only pointers and references are reference types. However, we find plenty of reference types in other languages. For example, Java and .NET classes are reference types. C# structs, by the way, are value types.

On the other hand, value types are types such that variables of that type are not references. In other words, the contents of the variable is the instance.

Do not confuse value types and reference types with value-semantic types and reference-semantic types. Also do not confuse value types with primitive types.

Now, since variables of reference types are references. And a shallow copy is the default. The assignment of reference types results in an incomplete copy… unless the default is overridden.

For value types, the assignment results in a complete copy, if and only, they are not composite types that include references. See also Can structs contain fields of reference types (C#).

Value-Semantic Types and Reference-Semantic Types

A value-semantic type is a type such that copy provides instance independence. That is, the result of the copy should not be usable to mutate the original. Emphasis on copy. This is not about making a reference.

This matches Alexis Gallagher’s Mutation game.

There are two simple ways to accomplish this:

Providing a complete copy. As we saw earlier, we can have a complete copy with a value type that includes no reference type fields, or overriding the default copy with a deep copy.
Make the type immutable. With an immutable type, a shallow copy will provide instance independence regardless of whatever or not the instance includes references… The reason is that you cannot mutate the original anyway. Which also means that it is OK for immutable instances to share memory.

However, in general, you must provide a copy that copies every part of the instance which is not immutable. If the type is immutable, then shallow copy is sufficient. If the type has no immutable parts (and it is a reference type or a value type which includes references) then you must provide a deep copy. If some parts are immutable and some are not, then you can archive value semantics by doing a deep copy of the mutable parts (and shallow copy of the immutable parts, sharing them). Which, by the way, is neither a shallow copy nor a deep copy, but a mixture.

Note. — Bjarne Stroustrup only considers deep and shallow copy when defining value semantics in Programming: Principles and Practice Using C++.

If we have a reference type, which only contains a field of an immutable reference type. Then it is sufficient to copy that reference. There is no need to copy the immutable instance. Then, when implement mutation operations by swapping that reference with a new one. This is copy-on-write.

Value Objects

From the book Domain-Driven Design: Tackling Complexity in the Heart of Software by Eric Evans (who coined the term “value object”):

Does an object represent something with continuity and identity—something that is tracked through different states or even across different implementations? Or is it an attribute that describes the state of something else? This is the basic distinction between an ENTITY and a VALUE OBJECT.

Evans also had the concern of value semantics:

We don’t care which instance we have of a VALUE OBJECT. This lack of constraints gives us design freedom we can use to simplify the design or optimize performance. This involves making choices about copying, sharing, and immutability.

We see the same definition, and the same concern for value semantics echoed by other authors.

From the book Patterns of Enterprise Application Architecture by Martin Fowler et al.:

The key difference between reference and value objects lies in how they deal with equality. A reference object uses identity as the basis for equality […]. A Value Object bases its notion of equality on field values within the class. Thus, two date objects may be the same if their day, month, and year values are the same. […] Most languages have no special facility for value objects. For value objects to work properly in these cases it’s a very good idea to make them immutable—that is, once created none of their fields change. The reason for this is to avoid aliasing bugs. An aliasing bug occurs when two objects share the same value object and one of the owners changes the values in it.

Bonus Chatter

Rule of three

This is particular to C++.

Let us say we want value semantics, and we have a value type that has no reference type fields. For this, the default shallow copy is sufficient.

Now, let us say we add a reference type field to our type. And thus, our shallow copy results in two instances with fields pointing to the same instance of the reference type.

To avoid the shallow copy we need to override the assignment operator, and implementing a deep copy. However, if we are not assigning to an existing variable but initializing a new one, the assignment operator does not get called, but the copy constructor instead (and again, the default is shallow copy). Thus, we need to override the copy constructor too.

We run into a similar problem with the default destructor. It will not follow reference. That is, it will not do a deep destruction. Which would mean we would be leaking the instance of the reference type field. Thus, we also need to override the default destructor.

Thus, we want to override the assignment operator, the copy constructor and the destructor. This is not possible in most languages.

On References and Value Semantics

We should not require the concept of references or pointers to define value semantics. Languages that do not have these concepts can still have value semantics.

There is another concept related to value objects we need to talk about: data transfer objects. DTOs are meant to cross boundaries. They might be going to another process, even to another machine. They may not. When crossing these boundaries references do not work. And thus, DTOs must avoid references.

DTOs should have no behavior, and have value semantics.

DTOs are often confused with value objects. Martin Fowler:

You usually can’t send the domain object itself, because it’s tied in a Web of fine-grained local inter-object references. So you take all the data that the client needs and bundle it in a particular object for the transfer—hence the term Data Transfer Object. (Many people in the enterprise Java community use the term value object for this, but this causes a clash with other meanings of the term Value Object).

Objects

If you go back to the definition of object (according to Grady Booch), you will find that objects have identity (and state and behavior, which could be none). However, we are ignoring this definition, instead we are saying that objects are instances of classes.

Plus, I would argue that the name value object is influence by the fact that Evans was working in Java, and thus could not define custom value types. To reiterate, Value Objects in Java are of reference types.

Thread Safety

Another argument for value semantics is thread safety.

Please note that if we are passing references, even if const references, that could be modified by another thread behind the scenes, we will run into trouble. Thus, any reference must be to an immutable type or a thread safe type.

Your Questions

is an object of value type a value object?

Value objects can be of value types or reference types.

is an object of reference type a reference object?

Instances of reference types would be reference objects, unless they override equality.

does an object of value type have value semantics?

If it does not have reference type fields, or if it overrides the default copy to provide value semantics.

does an object of reference type have reference semantics?

If it is not immutable and does not override the default copy to provide value semantics.

Appendix: Definitions of "Value Semantics", a time line

1998

This template version of List includes a generic iterator and value semantics to store generic data. Value semantics means that List stores instantiated objects, not pointers to objects. During insertion operations, List stores copies of data values instead of storing pointers. Although containers with value semantics allow applications to manage small objects and build-in types easily, many applications cannot tolerate the overhead of copying objects.

– Paul Anderson, Gail Anderson – Navigating C++ and Object-oriented Design

2004

STL containers are value semantic. When a task object is added to an STL container, the task object's allocator and copy constructor are called to clone the original. Similarly, when a task object is removed from an STL container, the task object's deallocator is called to delete the copy. The value semantics may be a performance concern, especially if producers and consumers frequently add tasks to and remove tasks from a queue.

– Ted Yuan – A C++ Producer-Consumer Concurrency Template Library

2004

ValueSemantics for objects-by-value are preserved by copying values between objects. ValueSemantics for objects-by-reference are preserved by using a CopyOnWrite mechanism. I had always thought that the story ended there. Are ValueObjects simply objects that preserve ValueSemantics or is there something more to them?

– PhilGoodwin – Value Objects Can Be Mutable

2014

Types that provide shallow copy (like pointers and references) are said to have pointer semantics or reference semantics (they copy addresses). Types that provide deep copy (like string and vector) are said to have value semantics (they copy the values pointed to). From a user perspective, types with value semantics behave as if no pointers were involved – just values that can be copied. One way of thinking of types with values semantics is that they “work just like integers” as far as copying is concerned.

– Bjarne Stroustrup – Programming: Principles and Practice Using C++

2015

it's (…) possible for a type to be value semantic provided that it keeps one very important property true which is if two objects of the given type have the same value today and we apply in the same salient operation (by salient I mean an operation that is intended to approximate the Platonic type that lives outside of the process that we're using as our model) then after that operation is applied to both objects they will again have the same value or they never did and that is a key property of value semantics.

Another way to say this would be if two objects have the same value then there does not exist a distinguishing sequence of salient operations that will cause them to no longer have the same value.

– John Lakos – An interview with John Lakos

2016

Value semantics amounts to being a guarantee of the independence of the value of variables.

And independence doesn’t mean structural things. What we’re talking about is can one thing affect another. So a type has value semantics if the only way to modify a variable’s value, a variable that has the value semantic type, is through the variable itself. If the only way to modify a variable’s values is through the variable itself, it’s a variable with semantic type.

(…)

The type is value semantic if it’s immune from side effects produced by other things. Not if it’s guaranteed not to perpetrate side effects on other things.

– Alexis Gallagher – Value SEMANTICS (not value types!)

Altri suggerimenti

These concepts are very closely related and all speak about the same thing.

The more abstract and general is the semantic:

value semantic means that only the value , the content of the object, matters. The unique identity of an object is not relevant. In everyday’s life a date has a value semantic: you can copy the value of that date in 10 documents, for you, wherever you’ll find the date, it’s always the same date.
reference semantic means on contrary that the value doesn’t tell it all. Each object has a unique identity, and has a history. It’s the same object whatever value it will take. A typical example in everyday’s life is a person. A person can change its name, its address, its job: it’s still the same person.

I used on purpose everyday’s life example that are not related to programming.

Now if you apply this concept to OOP you will get the concept of value and reference objects. Take the example of integer objects. In most of the languages, integers are value objects: if two integer objects have the same value, they are considered equal, even if it’s two different objects. You can also have a reference object: here you don’t care of the value, but only to the object itself. If you change a reference object, the new value is instantly known everywhere where the reference is used. In C++ you have value objects, but you always can make a reference object by using a pointer to an object or a reference.

Finally reference type vs. value type is the specialisation of the semantic by applying it to types. It’s a concept that is only relevant for typed languages. In C# for example a class is a reference type, and a struct is a value type. This means that every object created with the type as the semantic if the type.

Finally, it is worth to mention that value and reference is also relevant in non OOP language in the context of parameter passing.

In looking at these terms, we must appreciate that these terms are overloaded, and at various levels of abstraction.

We have broad and common sense usages of these terms, and then we have some of these terms defined by DDD (Domain Driven Design), and then also by various programming languages.

The definitions by programming languages are specific and precise each for their individual language. For example, Java has primitive types which are value types, exhibit value semantics, etc.. It traditionally has not had user-defined value types, but substitutes the use of immutable object types, such as with the string class — however all objects have a location that can be observed (by pointer equality comparison), including strings and other immutable types.

DDD defines Value Objects as objects without identity, but doesn't define Value Types.

C# offers user-defined value types, but these can be mutated, and you can observe the location of them.

(C++ is whole other can of worms, with its own terms & rules.)

Reference types give use references to objects rather than "reference objects" per se — which is just terminology, I think.

I would agree with your other thoughts, though modulo the context your working in as per the above.

Yes, value types give value objects
Yes, value objects have value semantics
Yes, reference types have reference semantics, which implies mutability, the need to manage object lifetime (perhaps by reference counting), the need for synchronization, etc..

After discussing @Christophe’s and @Theraot’s excellent answers, and drawing inspiration from Bjarne Stroustrup’s and Phil Goodwin’s definitions, I finally came to the following set of definitions, that are close but slightly different (more general) from the previous authors’:

Value/reference semantics. — An independency/dependency relation between objects.
Value/reference type. — A type that provides copies in value/reference semantics relation.
Value/reference object. — An instance of a value/reference type.

Sufficient conditions for value semantics:

the objects are deep copies of one another, or
the objects do not hold references and are shallow copies of one another, or
the objects do not hold references to mutable objects, hold references to immutable objects and are shallow copies of one another, or
the objects hold references to mutable objects, hold references to immutable objects and are mutable-deep and immutable-shallow copies of one another.

Sufficient condition for reference semantics:

the objects hold references to mutable objects and are shallow copies of one another.

Feel free to give your feedback in comments.

Playground

Memory layout resulting from assignment in C++:

int i{3};              // i:3
int j{i};              // i:3 j:3 (copy of i: j)

int* p{&i};            // i:3 p:&i (alias of i: *p)
int* q{p};             // i:3 p:&i q:&i (copy of p: q, alias of i: *q)
int* r{new int{*p}};   // i:3 p:&i *r:3 r:_ (copy of i: *r)

int** s{&p};           // i:3 p:&i s:&p (alias of p: *s)
int** t{s};            // i:3 p:&i s:&p t:&p (copy of s: t, alias of p: *t)
int** u{new int*{*s}}; // i:3 p:&i s:&p *u:&p u:_ (copy of s: *u, alias of p: **u)

Here i and j are in value semantics relation, p an q are in reference semantics relation, p and r are in value semantics relation, s and t are in reference semantics relation, and s and u are in reference semantics relation.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a softwareengineering.stackexchange