Default pass-by-reference semantics in C++ [closed]

https://stackoverflow.com/questions/146271

02-07-2019
|

Question

EDIT: This question is more about language engineering than C++ itself. I used C++ as an example to show what I wanted, mostly because I use it daily. I didn't want to know how it works on C++ but open a discussion on how it could be done.

That's not the way it works right now, that's the way I wish it could be done, and that would break C compability for sure, but that's what I think extern "C" is all about.

I mean, in every function or method that you declare right now you have to explicit write that the object will be sent by reference prefixing the reference operator on it. I wish that every non-POD type would be automatically sent by reference, because I use that a lot, actually for every object that is more than 32 bits in size, and that's almost every class of mine.

Let's exemplify how it's right now, assume a, b and c to be classes:

class example {
    public:
        int just_use_a(const a &object);
        int use_and_mess_with_b(b &object);
        void do_nothing_on_c(c object);
};

Now what I wish:

class example {
    public:
        int just_use_a(const a object);
        int use_and_mess_with_b(b object);
        extern "C" void do_nothing_on_c(c object);
};

Now, do_nothing_on_c() could behave just like it is today.

That would be interesting at least for me, feels much more clear, and also if you know every non-POD parameter is coming by reference I believe the mistakes would be the same that if you had to explicit declare it.

Another point of view for this change, from someone coming from C, the reference operator seems to me a way to get the variable address, that's the way I used for getting pointers. I mean, it is the same operator but with different semantic on different contexts, doesn't that feel a little bit wrong for you too?

Solution

I guess you're missing the point of C++, and C++ semantics. You missed the fact C++ is correct in passing (almost) everything by value, because it's the way it's done in C. Always. But not only in C, as I'll show you below...

Parameters Semantics on C

In C, everything is passed by value. "primitives" and "PODs" are passed by copying their value. Modify them in your function, and the original won't be modified. Still, the cost of copying some PODs could be non-trivial.

When you use the pointer notation (the * ), you're not passing by reference. You're passing a copy of the address. Which is more or less the same, with but one subtle difference:

typedef struct { int value ; } P ;

/* p is a pointer to P */
void doSomethingElse(P * p)
{
   p->value = 32 ;
   p = malloc(sizeof(P)) ; /* Don't bother with the leak */
   p->value = 45 ;
}

void doSomething()
{
   P * p = malloc(sizeof(P)) ;
   p->value = 25 ;

   doSomethingElse(p) ;

     int i = p->value ;
   /* Value of p ? 25 ? 32 ? 42 ? */
}

The final value of p->value is 32. Because p was passed by copying the value of the address. So the original p was not modified (and the new one was leaked).

Parameters Semantics on Java and C Sharp

It can be surprising for some, but in Java, everything is copied by value, too. The C example above would give exactly the same results in Java. This is almost what you want, but you would not be able to pass primitive "by reference/pointer" as easily as in C.

In C#, they added the "ref" keyword. It works more or less like the reference in C++. The point is, on C#, you have to mention it both on the function declaration, and on each and every call. I guess this is not what you want, again.

Parameters Semantics on C++

In C++, almost everything is passed by copying the value. When you're using nothing but the type of the symbol, you're copying the symbol (like it is done in C). This is why, when you're using the *, you're passing a copy of the address of the symbol.

But when you're using the &, then assume you are passing the real object (be it struct, int, pointer, whatever): The reference.

It is easy to mistake it as syntaxic sugar (i.e., behind the scenes, it works like a pointer, and the generated code is the same used for a pointer). But...

The truth is that the reference is more than syntaxic sugar.

Unlike pointers, it authorizes manipulating the object as if on stack.
Unline pointers, when associatied with the const keyword, it authorizes implicit promotion from one type to another (through constructors, mainly).
Unlike pointers, the symbol is not supposed to be NULL/invalid.
Unlike the "by-copy", you are not spending useless time copying the object
Unlike the "by-copy", you can use it as an [out] parameter
Unlike the "by-copy", you can use the full range of OOP in C++ (i.e. you pass a full object to a function waiting an interface).

So, references has the best of both worlds.

Let's see the C example, but with a C++ variation on the doSomethingElse function:

struct P { int value ; } ;

// p is a reference to a pointer to P
void doSomethingElse(P * & p)
{
   p->value = 32 ;
   p = (P *) malloc(sizeof(P)) ; // Don't bother with the leak
   p->value = 45 ;
}

void doSomething()
{
   P * p = (P *) malloc(sizeof(P)) ;
   p->value = 25 ;

   doSomethingElse(p) ;

     int i = p->value ;
   // Value of p ? 25 ? 32 ? 42 ?
}

The result is 42, and the old p was leaked, replaced by the new p. Because, unlike C code, we're not passing a copy of the pointer, but the reference to the pointer, that is, the pointer itself.

When working with C++, the above example must be cristal clear. If it is not, then you're missing something.

Conclusion

C++ is pass-by-copy/value because it is the way everything works, be it in C, in C# or in Java (even in JavaScript... :-p ...). And like C#, C++ has a reference operator/keyword, as a bonus.

Now, as far as I understand it, you are perhaps doing what I call half-jockingly C+, that is, C with some limited C++ features.

Perhaps your solution is using typedefs (it will enrage your C++ colleagues, though, to see the code polluted by useless typedefs...), but doing this will only obfuscate the fact you're really missing C++ there.

As said in another post, you should change your mindset from C development (of whatever) to C++ development, or you should perhaps move to another language. But do not keep programing the C way with C++ features, because by consciously ignoring/obfuscating the power of the idioms you use, you'll produce suboptimal code.

Note: And do not pass by copy anything else than primitives. You'll castrate your function from its OO capacity, and in C++, this is not what you want.

Edit

The question was somewhat modified (see https://stackoverflow.com/revisions/146271/list ). I let my original answer, and answer the new questions below.

What you think about default pass-by-reference semantics on C++? Like you said, it would break compatibility, and you'll have different pass-by for primitives (i.e. built-in types, which would still be passed by copy) and structs/objects (which would be passed as references). You would have to add another operator to mean "pass-by-value" (the extern "C" is quite awful and already used for something else quite different). No, I really like the way it is done today in C++.

[...] the reference operator seems to me a way to get the variable address, that's the way I used for getting pointers. I mean, it is the same operator but with different semantic on different contexts, doesn't that feel a little bit wrong for you too? Yes and no. Operator >> changed its semantic when used with C++ streams, too. Then, you can use operator += to replace strcat. I guess the operator & got used because its signification as "opposite of pointer", and because they did not want to use yet another symbol (ASCII is limited, and the scope operator :: as well as pointer -> shows that few other symbols are usable). But now, if & bothers you, && will really unnerve you, as they added an unary && in C++0x (a kind of super-reference...). I've yet to digest it myself...

OTHER TIPS

A compiler option that totally changes the meaning of a section of code sounds like a really bad idea to me. Either get use to the C++ syntax or find a different language.

I'd rather not abuse references any more by making every (non-qualified) parameter a reference.

The main reason references were added to C++ was to support operator overloading; if you want "pass-by-reference" semantics, C had a perfectly reasonable way of doing it: pointers.

Using pointers makes clear your intention of changing the value of the pointed object, and it is possible to see this by just looking at the function call, you don't have to look at the function declaration to see if it's using a reference.

Also, see

I do want to change the argument, should I use a pointer or should I use a reference? I don't know a strong logical reason. If passing ``not an object'' (e.g. a null pointer) is acceptable, using a pointer makes sense. My personal style is to use a pointer when I want to modify an object because in some contexts that makes it easier to spot that a modification is possible.

from the same FAQ.

Yeah, I'm of the opinion that that's a pretty confusing overload.

This is what microsoft has to say about the situation:

Do not confuse reference declarations with use of the address-of operator. When & identifier is preceded by a type, such as int or char, then identifier is declared as a reference to the type. When & identifier is not preceded by a type, the usage is that of the address-of operator.

I'm not really great on C or C++, but I get bigger headaches sorting out the various uses of * and & on both languages than I do coding in assembler.

The best advice is to make a habit of thinking about what you really want to happen. Passing by reference is nice when you don't have a copy constructor (or don't want to use it) and it's cheaper for large objects. However, then mutations to the parameter are felt outside the class. You could instead pass by const reference -- then there are no mutations but you cannot make local modifications. Pass const by-value for cheap objects that should be read-only in the function and pass non-const by-value when you want a copy that you can make local modifications to.

Each permutation (by-value/by-reference and const/non-const) has important differences that are definitely not equivalent.

When you pass by value, you are copying data to the stack. In the event that you have an operator= defined for the struct or class that you are passing it, it gets executed. There is no compiler directive I am aware of that would wash away the rigmarole of implicit language confusion that the proposed change would inherently cause.

A common best practice is to pass values by const reference, not just by reference. This ensures that the value cannot be changed in the calling function. This is one element of a const-correct codebase.

A fully const-correct codebase goes even further, adding const to the end of prototypes. Consider:

void Foo::PrintStats( void ) const {
   /* Cannot modify Foo member variables */
}

void Foo::ChangeStats( void ) {
   /* Can modify foo member variables */
}

If you were to pass a Foo object in to a function, prefixed with const, you are able to call PrintStats(). The compiler would error out on a call to ChangeStats().

void ManipulateFoo( const Foo &foo )
{
    foo.PrintStats();  // Works
    foo.ChangeStats(); // Oops; compile error
}

I honestly think that this whole passing by value/passing by reference idea in C++ is misleading. Everything is pass by value. You have three cases:

Where you pass a local copy of a variable
```
void myFunct(int cantChangeMyValue)
```

Where you pass a local copy of a pointer to a variable

void myFunct(int* cantChangeMyAddress) {
    *cantChangeMyAddress = 10;
}

Where you pass a 'reference', but through compiler magic it's just as if you passed a pointer and simply dereferenced it every time.
```
void myFunct(int & hereBeMagic) {
    hereBeMagic = 10; // same as 2, without the dereference
}
```

I personally find it much less confusing to remember that everything is pass by value. In some cases, that value might be an address, which allows you to change things outside the function.

What you are suggesting would not allow the programmer to do number 1. I personally think that would be a bad idea to take away that option. One major plus of C/C++ is having have fine grained memory management. Making everything pass by reference is simply trying to make C++ more like Java.

there are something not clear. when you say:

int b(b &param);

what did you intend for the second 'b'? did you forget to introduce a type? did you forget to write differently with respect to the first 'b'? don't you think it's clear to write something like:

class B{/*something...*/};
int b(B& param);

Since now, I suppose that you mean what I write.

Now, your question is "don't you think will be better that the compiler will consider every pass-by-value of a non-POD as pass-by-ref?". The first problem is that it will broke your contract. I suppose you mean pass-by-CONST-reference, and not just by reference.

Your question now is reduced to this one: "do you know if there's some compilers directive that can optimize function call by value?"

The answer now is "I don't know".

I think that c++ become very messy if you start to mix all the kind of available parameters, with their const variations.

It gets rapidly out of hand to trak all the copy constructors calls, all the dereferences overloaded and so on.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow