Why most “well-known” imperative/OO languages allow unchecked access to types that can represent a 'nothing' value?

https://softwareengineering.stackexchange.com/questions/221425

01-10-2020
|

Question

I have been reading about the (un)convenience of having null instead of (for example) Maybe. After reading this article, I am convinced that it would be much better to use Maybe (or something similar). However, I am surprised to see that all the "well-known" imperative or object oriented programming languages still use null (which allows unchecked access to types that can represent a 'nothing' value), and that Maybe is mostly used in functional programming languages.

As an example, look at the following C# code:

void doSomething(string username)
{
    // Check that username is not null
    // Do something
}

Something smells bad here... Why should we be checking if the argument is null? Shouldn't we assume that every variable contains a reference to an object? As you can see, the problem is that by definition almost all variables can contain a null reference. What if we could decide which variables are "nullable" and which not? That would save us a lot of effort while debugging and looking for a "NullReferenceException". Imagine that, by default, no types could contain a null reference. Instead of that, you would state explicitly that a variable can contain a null reference, only if you really need it. That is the idea behind Maybe. If you have a function that in some cases fails (for example division by zero), you could return a Maybe<int>, stating explicitly that the result may be an int, but also nothing! This is one reason to prefer Maybe instead of null. If you are interested in more examples, then I suggest to read this article.

The facts are that, despite the disadvantages of making most types nullable by default, most of the OO programming languages actually do it. That is why I wonder about:

What kind of arguments would you have to implement null in your programming language instead of Maybe? Are there reasons at all or is it just "historical baggage"?

Please ensure you understand the difference between null and Maybe before answering this question.

Solution

I believe it is primarily historical baggage.

The most prominent and oldest language with null is C and C++. But here, null does makes sense. Pointers are still quite numerical and low-level concept. And how someone else said, in the mindset of C and C++ programmers, having to explicitly tell that pointer can be null doesn't make sense.

Second in line comes Java. Considering Java developers were trying to get closest to C++, so they can make transition from C++ to Java simpler, they probably didn't want to mess with such core concept of the language. Also, implementing explicit null would require much more effort, because you have to check if non-null reference is actually set properly after initialization.

All other languages are same as Java. They usually copy the way C++ or Java does it, and considering how core concept of implicit null of reference types is, it becomes really hard to design a language that is using explicit null.

OTHER TIPS

Actually, null is a great idea. Given a pointer, we want to designate that this pointer does not reference a valid value. So we take one memory location, declare it invalid, and stick to that convention (a convention sometimes enforced with segfaults). Now whenever I have a pointer I can check if it contains Nothing (ptr == null) or Some(value) (ptr != null, value = *ptr). I want you to understand that this is equivalent to a Maybe type.

The problems with this are:

In many language the type system does not assist here to guarantee a non-null reference.

This is historical baggage, as many mainstream imperative or OOP languages have only had incremental advances in their type systems when compared to predecessors. Small changes have the advantage that new languages are easier to learn. C# is a mainstream language that has introduced language-level tools to better handle nulls.
API designers might return null on failure, but not a reference to the actual thing itself on success. Often, the thing (without a reference) is returned directly. This flattening of one pointer level makes it impossible to use null as a value.

This is just laziness on the designer's side and can't be helped without enforcing proper nesting with a proper type system. Some people might also try to justify this with performance considerations, or with the existence of optional checks (a collection might return null or the item itself, but also provide an contains method).
In Haskell there is a neat view onto the Maybe type as a monad. This makes it easier to compose transformations on the contained value.

On the other hand, low-level languages like C barely treat arrays as a separate type, so I'm not sure what we're expecting. In OOP languages with parameterized polymorphism, a runtime-checked Maybe type is rather trivial to implement.

My understanding is that null was a necessary construct in order to abstract programming languages out of assembly.¹ Programmers needed the ability to indicate that a pointer or register value was not a valid value and null became the common term for that meaning.

Reinforcing the point that null is just a convention to represent a concept, the actual value for null used to be able to / can vary based upon programming language and platform.

If you were designing a new language and wanted to avoid null but use maybe instead, then I would encourage a more descriptive term such as not a valid value or navv to indicate the intent. But the name of that non-value is a separate concept from whether you should allow non-values to even exist in your language.

Before you can decide on either of those two points, you need to define what the meaning of maybe would mean for your system. You may find it's just a rename of null's meaning of not a valid value or you may find it has a different semantic for your language.

Likewise, the decision on whether to check access against null access or reference is another design decision of your language.

To provide a bit of history, C had an implicit assumption that programmers understood what they were attempting to do when manipulating memory. As it was a superior abstraction to assembly and the imperative languages that preceded it, I would venture that the thought of safe-guarding the programmer from an incorrect reference hadn't come across their mind.

I believe that some compilers or their additional tooling can provide a measure of checking against invalid pointer access. So others have noted this potential issue and taken measures to protect against it.

Whether or not you should allow it depends upon what you want your language to accomplish and what degree of responsibility you want to push to users of your language. It also depends upon your ability to craft a compiler to restrict that type of behavior.

So to answer your questions:

"What kind of arguments…" - Well, it depends upon what you want the language to do. If you want to simulate bare-metal access then you may want to allow it.
"is it just historical baggage?" Perhaps, perhaps not. null certainly had / has meaning for a number of languages and helps drive the expression of those languages. Historical precedent may have affected more recent languages and their allowing of null but it's a bit much to wave your hands and declare the concept useless historical baggage.

¹ See this Wikipedia article although credit is given for to Hoare for null values and object-oriented languages. I believe the imperative languages progressed along a different family tree than Algol did.

If you look at the examples in the article you cited, most of the time using Maybe doesn't shorten the code. It doesn't obviate the need to check for Nothing. The only difference is it reminds you to do so via the type system.

Note, I say "remind," not force. Programmers are lazy. If a programmer is convinced a value can't possibly be Nothing, they're going to dereference the Maybe without checking it, just like they dereference a null pointer now. The end result is you convert a null pointer exception into an "dereferenced empty maybe" exception.

The same principle of human nature applies in other areas where programming languages try to force programmers to do something. For example, the Java designers tried to force people to handle most exceptions, which resulted in a lot of boilerplate that either silently ignores or blindly propagates exceptions.

What makes Maybe is nice is when a lot of decisions are made via pattern matching and polymorphism instead of explicit checks. For example, you could create separate functions processData(Some<T>) and processData(Nothing<T>), which you can't do with null. You automatically move your error handling to a separate function, which is very desirable in functional programming where functions are passed around and evaluated lazily rather than always being called in a top-down manner. In OOP, the preferred way to decouple your error handling code is with exceptions.

Maybe is a very functional way of thinking of a problem--there is a thing, and it may or may not have a value that is defined. In an object-oriented sense, however, we replace that idea of a thing (no matter if it has a value or not) with an object. Clearly, an object has a value. If it doesn't, we say the object is null, but what we really mean is that there isn't any object at all. The reference we have to the object points to nothing. Translating Maybe into an OO concept does nothing novel--in fact, it just makes for more greatly cluttered code. You still have to have a null reference for the value of the Maybe<T>. You still have to do null checks (in fact, you have to do a lot more null checks, cluttering your code), even if they are now called "maybe checks". Sure, you'll write more robust code as the author claims, but I'd argue that it is only the case because you've made the language far more abstract and obtuse, requiring a level of work that is unnecessary in most cases. I'd willingly take a NullReferenceException once in a while than deal with spaghetti code doing a Maybe check every time I access a new variable.

The concept of null can easily be traced back to C but that's not where the problem lies.

My everyday language of choice is C# and I would keep null with one difference. C# has two kinds of types, values and references. Values can never be null, but there are times I'd like to be able to express that no value is perfectly fine. To do this C# uses Nullable types so int would be the value and int? the nullable value. This is how I think reference types should work as well.

Also see: Null reference may not be a mistake:

Null references are helpful and sometimes indispensable (consider how much trouble if you may or may not return a string in C++). The mistake really is not in the existence of the null pointers, but in how the type system handles them. Unfortunately most languages (C++, Java, C#) don’t handle them correctly.

I think this is because functional programming is much concerned about types, especially types that combine other types (tuples, functions as first class types, monads, etc.) than object-oriented programming does (or at least initially did).

Modern versions of the programming languages I think you're talking about (C++, C#, Java) are all based on languages that didn't have any form of generic programming (C, C# 1.0, Java 1). Without that, you still can bake some kind of difference between nullable and non-nullable objects into the language (like C++ references, which can't be null, but are also limited), but it's much less natural.

I think the fundamental reason is that relatively few null checks are required to make a program "safe" against data corruption. If a program tries to use the contents of an array element or other storage location which is supposed to have been written with a valid reference but wasn't, the best-case outcome is for an exception to be thrown. Ideally, the exception will indicate exactly where the problem occurred, but what matters is that some kind of exception gets thrown before the null reference gets stored somewhere that could cause data corruption. Unless a method stores an object without trying to use it in some fashion first, an attempt to use an object will--in and of itself--constitute a "null check" of sorts.

If one wants to ensure that an null reference which appears where it shouldn't will cause a particular exception other than NullReferenceException, it will often be necessary to include null checks all over the place. On the other hand, merely ensuring that some exception will occur before a null reference can cause "damage" beyond any that has already been done will often require relatively few tests--testing would generally only be required in cases where an object would store a reference without trying to use it, and either the null reference would overwrite a valid one, or it would cause other code to misinterpret other aspects of program state. Such situations exist, but aren't all that common; most accidental null references will get caught very quickly whether one checks for them or not.

"Maybe," as written, is a higher level construct than null. Using more words to define it, Maybe is, "a pointer to either a thing, or a pointer to nothing, but the compiler has not yet been given enough information to determine which one." This forces you to explicitly check each value constantly, unless you build a compiler specification that is smart enough to keep up with the code you write.

You can make an implementation of Maybe with a language that has nulls easily. C++ has one in the form of boost::optional<T>. Making the equivalent of null with Maybe is very difficult. In particular, if I have a Maybe<Just<T>>, I cannot assign it to null (because such a concept does not exist), while a T** in a language with null is very easy to assign to null. This forces one to use Maybe<Maybe<T>>, which is totally valid, but will force you to do many more checks to use that object.

Some functional languages use Maybe because null requires either undefined behavior or exception handling, neither of which is an easy concept to map into functional language syntaxes. Maybe fills the role much better in such functional situations, but in procedural languages, null is king. It's not a matter of right and wrong, just a matter of what makes it easier to tell the computer to do what you want it to do.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange