Why was C# made with “new” and “virtual+override” keywords unlike Java?

https://softwareengineering.stackexchange.com/questions/245393

04-10-2020
|

Question

In Java there are no virtual, new, override keywords for method definition. So the working of a method is easy to understand. Cause if DerivedClass extends BaseClass and has a method with same name and same signature of BaseClass then the overriding will take place at run-time polymorphism (provided the method is not static).

BaseClass bcdc = new DerivedClass(); 
bcdc.doSomething() // will invoke DerivedClass's doSomething method.

Now come to C# there can be so much confusion and hard to understand how the new or virtual+derive or new + virtual override is working.

I'm not able to understand why in the world I'm going to add a method in my DerivedClass with same name and same signature as BaseClass and define a new behaviour but at the run-time polymorphism, the BaseClass method will be invoked! (which is not overriding but logically it should be).

In case of virtual + override though the logical implementation is correct, but the programmer has to think which method he should give permission to user to override at the time of coding. Which has some pro-con (let's not go there now).

So why in C# there are so much space for un-logical reasoning and confusion. So may I reframe my question as in which real world context should I think of use virtual + override instead of new and also use of new instead of virtual + override?

After some very good answers especially from Omar, I get that C# designers gave stress more about programmers should think before they make a method, which is good and handles some rookie mistakes from Java.

Now I've a question in mind. As in Java if I had a code like

Vehicle vehicle = new Car();
vehicle.accelerate();

and later I make new class SpaceShip derived from Vehicle. Then I want to change all car to a SpaceShip object I just have to change single line of code

Vehicle vehicle = new SpaceShip();
vehicle.accelerate();

This will not break any of my logic at any point of code.

But in case of C# if SpaceShip does not override the Vehicle class' accelerate and use new then the logic of my code will be broken. Isn't that a disadvantage?

Solution

Since you asked why C# did it this way, it's best to ask the C# creators. Anders Hejlsberg, the lead architect for C#, answered why they chose not to go with virtual by default (as in Java) in an interview, pertinent snippets are below.

Keep in mind that Java has virtual by default with the final keyword to mark a method as non-virtual. Still two concepts to learn, but many folks do not know about the final keyword or don't use proactively. C# forces one to use virtual and new/override to consciously make those decisions.

There are several reasons. One is performance. We can observe that as people write code in Java, they forget to mark their methods final. Therefore, those methods are virtual. Because they're virtual, they don't perform as well. There's just performance overhead associated with being a virtual method. That's one issue.

A more important issue is versioning. There are two schools of thought about virtual methods. The academic school of thought says, "Everything should be virtual, because I might want to override it someday." The pragmatic school of thought, which comes from building real applications that run in the real world, says, "We've got to be real careful about what we make virtual."

When we make something virtual in a platform, we're making an awful lot of promises about how it evolves in the future. For a non-virtual method, we promise that when you call this method, x and y will happen. When we publish a virtual method in an API, we not only promise that when you call this method, x and y will happen. We also promise that when you override this method, we will call it in this particular sequence with regard to these other ones and the state will be in this and that invariant.

Every time you say virtual in an API, you are creating a call back hook. As an OS or API framework designer, you've got to be real careful about that. You don't want users overriding and hooking at any arbitrary point in an API, because you cannot necessarily make those promises. And people may not fully understand the promises they are making when they make something virtual.

The interview has more discussion about how developers think about class inheritance design, and how that led to their decision.

Now to the following question:

I'm not able to understand why in the world I'm going to add a method in my DerivedClass with same name and same signature as BaseClass and define a new behaviour but at the run-time polymorphism, the BaseClass method will be invoked! (which is not overriding but logically it should be).

This would be when a derived class wants to declare that it does not abide by the contract of the base class, but has a method with the same name. (For anyone who doesn't know the difference between new and override in C#, see this MSDN page).

A very practical scenario is this:

You created an API, which has a class called Vehicle.
I started using your API and derived Vehicle.
Your Vehicle class did not have any method PerformEngineCheck().
In my Car class, I add a method PerformEngineCheck().
You released a new version of your API and added a PerformEngineCheck().
I cannot rename my method because my clients are dependent on my API, and it would break them.

So when I recompile against your new API, C# warns me of this issue, e.g.

If the base PerformEngineCheck() was not virtual:

app2.cs(15,17): warning CS0108: 'Car.PerformEngineCheck()' hides inherited member 'Vehicle.PerformEngineCheck()'.
Use the new keyword if hiding was intended.

And if the base PerformEngineCheck() was virtual:

app2.cs(15,17): warning CS0114: 'Car.PerformEngineCheck()' hides inherited member 'Vehicle.PerformEngineCheck()'.
To make the current member override that implementation, add the override keyword. Otherwise add the new keyword.

Now, I must explicitly make a decision whether my class is actually extending the base class' contract, or if it is a different contract but happens to be the same name.
By making it new, I do not break my clients if the functionality of the base method was different from the derived method. Any code that referenced Vehicle will not see Car.PerformEngineCheck() called, but code that had a reference to Car will continue to see the same functionality that I had offered in PerformEngineCheck().

A similar example is when another method in the base class might be calling PerformEngineCheck() (esp. in the newer version), how does one prevent it from calling the PerformEngineCheck() of the derived class? In Java, that decision would rest with the base class, but it does not know anything about the derived class. In C#, that decision rests both on the base class (via the virtual keyword), and on the derived class (via the new and override keywords).

Of course, the errors that the compiler throws also provide a useful tool for the programmers to not unexpectedly make errors (i.e. either override or provide new functionality without realizing so.)

Like Anders said, real world forces us into such issues which, if we were to start from scratch, we would never want to get into.

EDIT: Added an example of where new would have to be used for ensuring interface compatibility.

EDIT: While going through the comments, I also came across a write-up by Eric Lippert (then one of the members of C# design committee) on other example scenarios (mentioned by Brian).

PART 2: Based on updated question

But in case of C# if SpaceShip does not override the Vehicle class' accelerate and use new then the logic of my code will be broken. Isn't that a disadvantage?

Who decides whether SpaceShip is actually overriding the Vehicle.accelerate() or if it's different? It has to be the SpaceShip developer. So if SpaceShip developer decides that they are not keeping the contract of the base class, then your call to Vehicle.accelerate() should not go to SpaceShip.accelerate(), or should it? That is when they will mark it as new. However, if they decide that it does indeed keep the contract, then they will in fact mark it override. In either case, your code will behave correctly by calling the correct method based on the contract. How can your code decide whether SpaceShip.accelerate() is actually overriding Vehicle.accelerate() or if it is a name collision? (See my example above).

However, in the case of implicit inheritance, even if SpaceShip.accelerate() did not keep the contract of Vehicle.accelerate(), the method call would still go to SpaceShip.accelerate().

OTHER TIPS

It was done because it's the correct thing to do. The fact is that allowing all methods to be overridden is wrong; it leads to the fragile base class problem, where you have no way of telling if a change to the base class will break subclasses. Therefore you must either blacklist the methods that shouldn't be overridden or whitelist the ones that are allowed to be overridden. Of the two, whitelisting is not only safer (since you can't create a fragile base class accidentally), it also requires less work since you should avoid inheritance in favor of composition.

As Robert Harvey said, it's all in what you're used to. I find Java's lack of this flexibility odd.

That said, why have this in the first place? For the same reason that C# has public, internal (also "nothing"), protected, protected internal, and private, but Java just has public, protected, nothing, and private. It provides finer grain control over the behavior of what you're coding, at the expense of having more terms and keywords to keep track of.

In the case of new vs. virtual+override, it goes something like this:

If you want to force subclasses to implement the method, use abstract, and override in the subclass.
If you want to provide functionality but allow the subclass to replace it, use virtual, and override in the subclass.
If you want to provide functionality which subclasses should never need to override, don't use anything.
- If you then have a special case subclass which does need to behave differently, use new in the subclass.
- If you want to ensure that no subclass can ever override the behavior, use sealed in the base class.

For a real-world example: A project I worked on processed ecommerce orders from many different sources. There was a base OrderProcessor which had most of the logic, with certain abstract/virtual methods for each source's child class to override. This worked fine, up until we got a new source which had a completely different way of processing orders, such that we had to replace a core function. We had two choices at this point: 1) Add virtual to the base method, and override in the child; or 2) Add new to the child.

While either one could work, the first would make it very easy to override that particular method again in the future. It'd show up in auto-complete, for example. This was an exceptional case, however, so we chose to use new instead. That preserved the standard of "this method doesn't need to be overridden", while allowing for the special case where it did. It's a semantic difference which makes life easier.

_{Do note, however, that there is a behavior difference associated with this, not just a semantic difference. See this article for details. However, I've never run into a situation where I needed to take advantage of this behavior.}

The design of Java is such that given any reference to an object, a call to a particular method name with particular parameter types, if it is allowed at all, will always invoke the same method. It's possible that implicit parameter-type conversions may be affected by the type of a reference, but once all such conversions have been resolved, the type of the reference is irrelevant.

This simplifies the runtime, but can cause some unfortunate problems. Suppose GrafBase does not implement void DrawParallelogram(int x1,int y1, int x2,int y2, int x3,int y3), but GrafDerived implements it as a public method which draws a parallelogram whose computed fourth point is opposite the first. Suppose further that a later version of GrafBase implements a public method with the same signature, but its computed fourth point opposite the second. Clients which receive expect a GrafBase but receive a reference to a GrafDerived will expect DrawParallelogram to compute the forth point in the fashion of the new GrafBase method, but clients who have been using GrafDerived.DrawParallelogram before the base method was changed will expect the behavior which GrafDerived originally implemented.

In Java, there would be no way for the author of GrafDerived to make that class coexist with clients that use the new GrafBase.DrawParallelogram method (and may be unaware that GrafDerived even exists) without breaking compatibility with existing client code that used GrafDerived.DrawParallelogram before GrafBase defined it. Since the DrawParallelogram can't tell what kind of client is invoking it, it must behave identically when invoked by kinds of client code. Since the two kinds of client code have different expectations as to how it should behave, there's no way GrafDerived can avoid violating the legitimate expectations of at one of them (i.e. breaking legitimate client code).

In C#, if GrafDerived is not recompiled, the runtime will assume that code which invokes DrawParallelogram method upon references of type GrafDerived will be expecting the behavior GrafDerived.DrawParallelogram() had when it was last compiled, but code which invokes the method upon references of type GrafBase will be expecting GrafBase.DrawParallelogram (the behavior that was added). If GrafDerived is later recompiled in the presence of the enhanced GrafBase, the compiler will squawk until the programmer either specifies whether his method was intended to be a valid replacement for inherited member GrafBase, or whether its behavior needs to be tied to references of type GrafDerived, but should not replace the behavior of references of type GrafBase.

One might reasonably argue that having a method of GrafDerived do something different from a member of GrafBase which has the same signature would indicate bad design, and as such shouldn't be supported. Unfortunately, since the author of a base type has no way of knowing what methods might be added to derived types, nor vice versa, the situation where base-class and derived-class clients have different expectations for like-named methods is essentially unavoidable unless nobody's allowed to add any name which someone else might also add. The question is not whether such a name duplication should happen, but rather how to minimize the harm when it does.

Standard situation:

You are the owner of a base class that is used by multiple projects. You want to make a change to said base class that will break between 1 and countless derived classes, which are in projects providing real-world value (A framework provides value at best at one remove, no Real Human Being wants a Framework, they want the thing running on the Framework). Good luck stating to the busy owners of the derived classes; "well, you have to change, you shouldn't have overridden that method" without gaining a rep as a "Framework: Delayer of Projects and Causer of Bugs" to the people who have to approve decisions.

Especially as, by not declaring it non-overridable, you've implicitly declared they were okay to do the thing that now prevents your change.

And if you don't have a significant number of derived classes providing real world value by overriding your base class, why is it a base class in the first place? Hope is a powerful motivator, but also a very good way to end up with unreferenced code.

End result: Your framework base class code becomes incredibly fragile and static, and you can't really make the necessary changes to stay current/efficient. Alternatively, your framework gets a rep for instability (derived classes keep breaking) and people won't use it at all, since the main reason to use a framework is to make coding faster and more reliable.

Simply put, you cannot ask busy project owners to delay their project in order to fix bugs that you are introducing, and expect anything better than a "go away" unless you're providing significant benefits to them, even if the original "fault" was theirs, which is at best arguable.

Better to not let them do the wrong thing in the first place, which is where "non-virtual by default" comes in. And when someone comes to you with a very clear reason why they need this particular method to be overridable, and why it should be safe, you can "unlock" it without risking breaking anyone else's code.

Defaulting to non-virtual assumes that the base class developer is perfect. In my experience developers are not perfect. If the developer of a base class cannot imagine a use case where a method could be overridden or forgets to add virtual then I cannot take advantage of polymorphism when extending the base class without modifying the base class. In the real world modifying the base class is often not an option.

In C# the base class developer does not trust the subclass developer. In Java the subclass developer does not trust the base class developer. The subclass developer is responsible for the subclass and should (imho) be given the power to extend the base class as they see fit (barring explicit denial, and in java they can even get this wrong).

It's a fundamental property of the language definition. It isn't right or wrong, it is what it is and cannot change.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange