Null values handling in big scale applications

https://softwareengineering.stackexchange.com/questions/293328

10-10-2020
|

Question

Tl;dr: Should we return null and not know origins of the error or throw exceptions and handle them appropriately?

A few years ago I found this article: http://stackify.com/golden-rule-programming/

It says:

If it can be null, it will be null

This kind of thinking leads to defensive programming throughout the application, tons of ifnulls everywhere and in theory SHOULD never get a NullReferenceException and be more efficient compared to throwing exceptions everywhere if null is encountered.

On the other hand, if we return null in all layers of the application, we cannot know for sure where did the null value originate from.

The opposite is when throwing exceptions, because we can know origin of the error for sure.

Returning null value everywhere can come from data layer, from the database, it can even be business logic trying to make sense of data and failing to do so... Basically, it can be anything.

Therefore, I see two (or three) camps:

If it can be null, it should be null
Throw exceptions everywhere, if you encounter null value, where it is not anticipated
Mix the first two

Forehand, I think 3. Mixing the first two leads to a less predictable system and therefore slower pace of development, which is not something we want, right?

Using the 1. one leads to not knowing where did the null came from, but it is more efficient way to handle a big load (billions of requests).

The 2. is not as efficient performance wise, but is more verbose when diagnosing errors, by just looking at the logs or debugging.

The question itself is really conceptual and the answer I would like to get would be based on experience working with big scale applications. Really looking forward to what you think :)

Solution

I think you are missing several other options. For example, you could use a language which doesn't have null. Haskell, Ruby, Smalltalk, many others simply don't have null references, so the problem doesn't even arise.

Another option would be to use a language which does have null but has explicit null tracking that allows you to statically ensure that you can never have null references. Spec# does that, for example.

A variant of that option is to use a language which doesn't have null tracking, but use an external static analyzer, maybe aided by annotations. Code Contracts.NET (an offshoot of Spec#) does that, for example.

Yet another option is to encapsulate the notion of nullness into a type. One of the problems with null references is that null references are really overloaded. What does getting a null mean? Some things are meant to fail, e.g. parsers, or looking up a key in a dictionary. It is normal for parsers to be handed invalid input, it is normal to not find a key in a dictionary, that's not an error, and not an exceptional situation.

For this latter case, you can use an Option type. An Option is basically kind-of like a collection that is either empty or has exactly one element. (This analogy is actually pretty profound as we shall see shortly.) In the case of the dictionary example, a Dictionary<K, V> would have a method Option<V> get(K key) which returns either a "filled" Option with the value, if the key exists, or an "empty" Option if the key doesn't exist. The important thing is that the type system can ensure that you have to deal with both cases in your code.

Except that you often actually don't have to deal with both cases. That's where the analogy to collections comes in: an Option is isomorphic to a list / an array / an iterator / a stream / an IEnumerable / some other kind of collection which is either empty or holds exactly one value. This means that Option can implement whatever collections interfaces your language has, can be a member of the collections framework and can be used with all the collections methods. For example, if you want to transform the value, then you don't need to actually get the value out of the Option. All collections frameworks have a method to apply a transformation function to every element of the collection (map in most languages, collect in Smalltalk, Objective-C and Ruby, select in SQL and .NET, transform in C++). What does map do when you call it on an empty collection? It returns an empty collection. And on a non-empty collection it will return another collection with each element (or in this case the only element) transformed. You don't even have to check whether or not your Option is empty, you can just call map on it, and it will do the right thing.

You can chain multiple maps. You can use flatMap (SelectMany in .NET) to chain the use of Options. You can use fold (aka reduce, aggregate, accumulate) to get the value out or alternatively supply a default. And so on.

Apart from being a collection, Option is also a monad, this makes it easy to work with in languages that have special syntactic support for monadic operations (e.g. Haskell's do-notation, Scala's for-comprehensions, C#'s and VB.NET's LINQ query comprehensions).

In Haskell, it is called Maybe a and has two data constructors called Just a and Nothing. (If you are a Java person read Maybe<T> and Just<T> and think of data constructors as final subclasses.) In Scala, it is called Option[T] and has two subclasses Some[T] and None. Even Java has one, it is called Optional<T>.

Obviously, there is a difference between a language like Haskell, which doesn't have null in the first place, and a language like Java, where Optional<T> is an afterthought. In the first case, nulls cannot happen, period. In the second case, I would treat null like any other kind of invalid data: intercept it at the outer boundary of your system, and either throw an exception or sanitize it into an Optional<T>. (The really nasty thing about a language like Java is of course that a field of type Optional<T> may itself be null.)

In Scala, for example, null mainly exists for compatibility reasons. Pretty much the only way that you can end up with a null from the Scala language, is with a declared but not initialized mutable field. However, mutable fields are rarely used in Scala, and very un-idiomatic. No method in the Scala library will ever return null. Idiomatic Scala code will never return null. Usually, the only way to end up with null is by calling Java code, and that is typically handled the way I described above: you have a Scala wrapper for the Java code which handles all potential nulls and the rest of the code uses that Scala abstraction and doesn't bother with null at all.

Yet another option that is however not applicable in all cases is to use a special "empty" object to denote the absence of data instead of null. A dummy user account for a User field, an empty string for a String field, an empty list for a List field and so on.

OTHER TIPS

You are mixing stuff.

If it can be null, it will be null

That means that any value/object that can be null your code needs to deal with gracefully.

Three camps? Not know where it came from? Wrong.

There is one camp: test for null for any nullable type and deal with it gracefully. If null is not valid you throw an exception and yes you know exactly where the exception came from as the exception has a source. You don't pass the null on to the next layer if it is not valid and lose where null came from.

At the data layer if null is valid but it means 0 you just assign a 0 to a non-nullable type and be done with it. At the point it is a non-nullable type you no longer need to test null.

Reference types are typically nullable and need to be tested every time.

In most if not all environments value type of decimal cannot be null so you don't need to test for null.

One can look at state from three points:

The state-space the underlying machine provides.
The state-space the language provides.
The state-space the program-logic allows.

In "safe" languages, the first one can generally be disregarded. Though it's good for things like boolean values that are neither true nor false.

Haskell for example allows you to restrict most variables state-space to their program-logic space, while many others like Java make just about everything nullable, even if it logically isn't.

Thus you must handle it appropriately for each case.
Luckily, in "safe" languages you don't have to write most null-checks manually, the language throws an exception for you if you just use it.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange