Can someone explain to me what the reasoning behind passing by “value” and not by “reference” in Java is?

https://stackoverflow.com/questions/514478

21-08-2019
|

Question

I'm fairly new to Java (been writing other stuff for many years) and unless I'm missing something (and I'm happy to be wrong here) the following is a fatal flaw...

String foo = new String();
thisDoesntWork(foo);
System.out.println(foo);//this prints nothing

public static void thisDoesntWork(String foo){
   foo = "howdy";
}

Now, I'm well aware of the (fairly poorly worded) concept that in java everything is passed by "value" and not "reference", but String is an object and has all sorts of bells and whistles, so, one would expect that unlike an int a user would be able to operate on the thing that's passed into the method (and not be stuck with the value set by the overloaded =).

Can someone explain to me what the reasoning behind this design choice was? As I said, I'm not looking to be right here, and perhaps I'm missing something obvious?

Solution

This rant explains it better than I could ever even try to:

In Java, primitives are passed by value. However, Objects are not passed by reference. A correct statement would be Object references are passed by value.

OTHER TIPS

When you pass "foo", you're passing the reference to "foo" as a value to ThisDoesntWork(). That means that when you do the assignment to "foo" inside of your method, you are merely setting a local variable (foo)'s reference to be a reference to your new string.

Another thing to keep in mind when thinking about how strings behave in Java is that strings are immutable. It works the same way in C#, and for some good reasons:

Security: Nobody can jam data into your string and cause a buffer overflow error if nobody can modify it!
Speed : If you can be sure that your strings are immutable, you know its size is always the same and you don't ever have to do a move of the data structure in memory when you manipulate it. You (the language designer) also don't have to worry about implementing the String as a slow linked-list, either. This cuts both ways, though. Appending strings just using the + operator can be expensive memory-wise, and you will have to use a StringBuilder object to do this in a high-performance, memory-efficient way.

Now onto your bigger question. Why are objects passed this way? Well, if Java passed your string as what you'd traditionally call "by value", it would have to actually copy the entire string before passing it to your function. That's quite slow. If it passed the string by reference and let you change it (like C does), you'd have the problems I just listed.

Since my original answer was "Why it happened" and not "Why was the language designed so it happened," I'll give this another go.

To simplify things, I'll get rid of the method call and show what is happening in another way.

String a = "hello";
String b = a;
String b = "howdy"

System.out.print(a) //prints hello

To get the last statement to print "hello", b has to point to the same "hole" in memory that a points to (a pointer). This is what you want when you want pass by reference. There are a couple of reasons Java decided not to go this direction:

Pointers are Confusing The designers of Java tried to remove some of the more confusing things about other languages. Pointers are one of the most misunderstood and improperly used constructs of C/C++ along with operator overloading.
Pointers are Security Risks Pointers cause many security problems when misused. A malicious program assigns something to that part of memory, then what you thought was your object is actually someone else's. (Java already got rid of the biggest security problem, buffer overflows, with checked arrays)
Abstraction Leakage When you start dealing with "What's in memory and where" exactly, your abstraction becomes less of an abstraction. While abstraction leakage almost certainly creeps into a language, the designers didn't want to bake it in directly.
Objects Are All You Care About In Java, everything is an object, not the space an object occupies. Adding pointers would make the space an object occupies importantant, though.......

You could emulate what you want by creating a "Hole" object. You could even use generics to make it type safe. For example:

public class Hole<T> {
   private T objectInHole;

   public void putInHole(T object) {
      this.objectInHole = object;
   }
   public T getOutOfHole() {
      return objectInHole;
   }

   public String toString() {
      return objectInHole.toString();
   }
   .....equals, hashCode, etc.
}


Hole<String> foo = new Hole<String)();
foo.putInHole(new String());
System.out.println(foo); //this prints nothing
thisWorks(foo);
System.out.println(foo);//this prints howdy

public static void thisWorks(Hole<String> foo){
   foo.putInHole("howdy");
}

Your question as asked doesn't really have to do with passing by value, passing by reference, or the fact that strings are immutable (as others have stated).

Inside the method, you actually create a local variable (I'll call that one "localFoo") that points to the same reference as your original variable ("originalFoo").

When you assign "howdy" to localFoo, you don't change where originalFoo is pointing.

If you did something like:

String a = "";
String b = a;
String b = "howdy"?

Would you expect:

System.out.print(a)

to print out "howdy" ? It prints out "".

You can't change what originalFoo points to by changing what localFoo points to. You can modify the object that both point to (if it wasn't immutable). For example,

List foo = new ArrayList();
System.out.println(foo.size());//this prints 0

thisDoesntWork(foo);
System.out.println(foo.size());//this prints 1

public static void thisDoesntWork(List foo){   
    foo.add(new Object);
}

In java all variables passed are actually passed around by value- even objects. All variables passed to a method are actually copies of the original value. In the case of your string example the original pointer ( its actually a reference - but to avoid confusion ill use a different word ) is copied into a new variable which becomes the parameter to the method.

It would be a pain if everything was by reference. One would need to make private copies all over the place which would definitely be a real pain. Everybody knows that using immutability for value types etc makes your programs infinitely simpler and more scalable.

Some benefits include: - No need to make defensive copies. - Threadsafe - no need to worry about locking just in case someone else wants to change the object.

The problem is you are instantiating a Java reference type. Then you pass that reference type to a static method, and reassign it to a locally scoped variable.

It has nothing to do with immutability. Exactly the same thing would have happened for a mutable reference type.

If we would make a rough C and assembler analogy:

void Main()
{ 
     // stack memory address of message is 0x8001.  memory address of Hello is 0x0001.  
     string message = "Hello"; 
     // assembly equivalent of: message = "Hello";
     // [0x8001] = 0x0001

     // message's stack memory address
     printf("%d", &message); // 0x8001

     printf("%d", message); // memory pointed to of message(0x8001): 0x0001
     PassStringByValue(message); // pass the pointer pointed to of message.  0x0001, not 0x8001
     printf("%d", message); // memory pointed to of message(0x8001): 0x0001.  still the same

     // message's stack memory address doesn't change
     printf("%d", &message); // 0x8001
}

void PassStringByValue(string foo)
{
    printf("%d", &foo); // &foo contains foo's *stack* address (0x4001)

    // foo(0x4001) contains the memory pointed to of message, 0x0001
    printf("%d", foo);  // 0x0001
    // World is in memory address 0x0002
    foo = "World";  // on foo's memory address (0x4001), change the memory it pointed to, 0x0002
    // assembly equivalent of: foo = "World":
    // [0x4001] = 0x0002

    // print the new memory pointed by foo
    printf("%d", foo); // 0x0002

    // Conclusion: Not in any way 0x8001 was involved in this function.  Hence you cannot change the Main's message value.
    // foo = "World"  is same as [0x4001] = 0x0002

}

void Main()
{
     // stack memory address of message is 0x8001.  memory address of Hello is 0x0001.  
     string message = "Hello"; 
     // assembly equivalent of: message = "Hello";
     // [0x8001] = 0x0001

     // message's stack memory address
     printf("%d", &message); // 0x8001

     printf("%d", message); // memory pointed to of message(0x8001): 0x0001
     PassStringByRef(ref message); // pass the stack memory address of message.  0x8001, not 0x0001
     printf("%d", message); // memory pointed to of message(0x8001): 0x0002. was changed

     // message's stack memory address doesn't change
     printf("%d", &message); // 0x8001
}


void PassStringByRef(ref string foo)
{
    printf("%d", &foo); // &foo contains foo's *stack* address (0x4001)

    // foo(0x4001) contains the address of message(0x8001)
    printf("%d", foo);  // 0x8001
    // World is in memory address 0x0002
    foo = "World"; // on message's memory address (0x8001), change the memory it pointed to, 0x0002
    // assembly equivalent of: foo = "World":
    // [0x8001] = 0x0002;


    // print the new memory pointed to of message
    printf("%d", foo); // 0x0002

    // Conclusion: 0x8001 was involved in this function.  Hence you can change the Main's message value.
    // foo = "World"  is same as [0x8001] = 0x0002

}

One possible reason why everything is passed by value in Java, its language designer folks want to simplify the language and make everything done in OOP manner.

They would rather have you design an integer swapper using objects than them provide a first class support for by-reference passing, the same for delegate(Gosling feels icky with pointer to function, he would rather cram that functionality to objects) and enum.

They over-simplify(everything is object) the language to the detriment of not having first class support for most language constructs, e.g. passing by reference, delegates, enum, properties comes to mind.

Are you sure it prints null? I think it will be just blank as when you initialized the foo variable you provided empty String.

The assigning of foo in thisDoesntWork method is not changing the reference of the foo variable defined in class so the foo in System.out.println(foo) will still point to the old empty string object.

Dave, you have to forgive me (well, I guess you don't "have to", but I'd rather you did) but that explanation is not overly convincing. The Security gains are fairly minimal since anyone who needs to change the value of the string will find a way to do it with some ugly workaround. And speed?! You yourself (quite correctly) assert that the whole business with the + is extremely expensive.

The rest of you guys, please understand that I GET how it works, I'm asking WHY it works that way... please stop explaining the difference between the methodologies.

(and I honestly am not looking for any sort of fight here, btw, I just don't see how this was a rational decision).

@Axelle

Mate do you really know the difference between passing by value and by reference ?

In java even references are passed by value. When you pass a reference to an object you are getting a copy of the reference pointer in the second variable. Tahts why the second variable can be changed without affecting the first.

It is because, it creates a local variable inside the method. what would be an easy way (which I'm pretty sure would work) would be:

String foo = new String();    

thisDoesntWork(foo);    
System.out.println(foo); //this prints nothing

public static void thisDoesntWork(String foo) {    
   this.foo = foo; //this makes the local variable go to the main variable    
   foo = "howdy";    
}

If you think of an object as just the fields in the object then objects are passed by reference in Java because a method can modify the fields of a parameter and a caller can observe the modification. However, if you also think of an object as it's identity then objects are passed by value because a method can't change the identity of a parameter in a way that the caller can observe. So I would say Java is pass-by-value.

This is because inside "thisDoesntWork", you are effectively destroying the local value of foo. If you want to pass by reference in this way, can always encapsulate the String inside another object, say in an array.

class Test {

    public static void main(String[] args) {
        String [] fooArray = new String[1];
        fooArray[0] = new String("foo");

        System.out.println("main: " + fooArray[0]);
        thisWorks(fooArray);
        System.out.println("main: " + fooArray[0]);
    }

    public static void thisWorks(String [] foo){
        System.out.println("thisWorks: " + foo[0]);
        foo[0] = "howdy";
        System.out.println("thisWorks: " + foo[0]);
    }
}

Results in the following output:

main: foo
thisWorks: foo
thisWorks: howdy
main: howdy

Reference typed arguments are passed as references to objects themselves (not references to other variables that refer to objects). You can call methods on the object that has been passed. However, in your code sample:

public static void thisDoesntWork(String foo){
    foo = "howdy";
}

you are only storing a reference to the string "howdy" in a variable that is local to the method. That local variable (foo) was initialized to the value of the caller's foo when the method was called, but has no reference to the caller's variable itself. After initialization:

caller     data     method
------    ------    ------
(foo) -->   ""   <-- (foo)

After the assignment in your method:

caller     data     method
------    ------    ------
(foo) -->   ""
          "hello" <-- (foo)

You have another issues there: String instances are immutable (by design, for security) so you can't modify its value.

If you really want your method to provide an initial value for your string (or at any time in its life, for that matter), then have your method return a String value which you assign to the caller's variable at the point of the call. Something like this, for example:

String foo = thisWorks();
System.out.println(foo);//this prints the value assigned to foo in initialization 

public static String thisWorks(){
    return "howdy";
}

Go do the really big tutorial on suns website.

You seem not to understand the difference scopes variables can be. "foo" is local to your method. Nothing outside of that method can change what "foo" points too. The "foo" being referred to your method is a completely different field - its a static field on your enclosing class.

Scoping is especially important as you dont want everything to be visible to everything else in your system.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow