Taking strong, static typing to an extreme? [duplicate]

https://softwareengineering.stackexchange.com/questions/284946

08-10-2020
|

Question

It is common in strong, static typing to use different types even for variables with simple, primitive types to ease static analysis and indicate intent to the programmer. A color and a point in 3D space might both be represented by an array of 3 floats, but given different type names. In C, a lot of common types are simply typedef'd ints.

I'm wondering: just how far is it practical to take this? If you're writing a function that takes a float between 0 and 1 (perhaps it represents a probability distribution) do you create a separate type? What about a function that must take a non-zero integer? Would it be reasonable to create a type nonZeroInt, not for the purposes of encapsulation, but simply for type-safety?

You can take this arbitrarily far, encoding all preconditions of functions into the type system. For example, you could define a "primeNumber" type for the input to a function that only takes a prime. A function which must take two integers with no common factors could be changed to take one argument of type "coprimePair."

The two hypothetical functions above would presumably be part of a library that includes functions to generate prime numbers and coprime pairs, and return them with the appropriate type. If I wanted to call a function in that library that requires a prime number, and I didn't get it from the included prime-generating function, I'd have to explicitly cast an int, essentially forcing me do a reality check and ask myself, "Am I certain that this variable will always be prime?"

My specific question is this: How far is this philosophy ever successfully taken in practice? Are there programming languages, or well-respected programming texts, that encourage a philosophy that says: every function that cannot accommodate the entire range of any existing type as its input, and return a meaningful result for all possible inputs, should instead define a new type?

Solution

The comments by Jack and tp1 (which should actually been answers) already explain how this is implemented in functional languages.

My answer adds a point about non-functional languages, especially the one quite popular in development industry: C#.

In languages such as C# or Java, it is indeed a current practice to create types when you need to constraint your values a little bit. If the only thing you need is a positive integer, you'll end up with PositiveInt class, but in general, those wrappers will reflect a bit more the business logic (ProductPrice, RebatePercentage, etc.), again with validation inside (a product price should be superior to zero, a rebate percentage don't accept a value superior to 100, etc.)

The benefits and the drawbacks and the risk of going too far with such types was recently discussed here.

Once you have your wrapper, you may start adding logic to it, and especially the validation. Basically, since the wrapper hides the value it wraps, the validation logic will be located in the constructor, and can be as basic as:

if (value > 100)
{
    throw new ArgumentOutOfRangeException("value", "The allowed range is (0..100].");
}

Another possibility provided by C# is to use code contracts, which provides several benefits:

The static checking catches misusing of those wrappers before you hit the error during the runtime,
Code contracts being enforced during runtime as well, you are sure the contract will not be misused even if static checking was disabled (or its warnings were dismissed by the developers),
With invariants checking the wrapped value, there is no way to assign an invalid value (with the limitation that the checks are done when a method starts or ends, and not at every step during the execution of the method),
Visual Studio integration makes the code self-documenting, by providing hints about the contracts to the callers.

Is it successfully taken in practice? Well, there are thousands of methods within the .NET Framework itself which contain code contracts. For business code, it mostly comes to how critical is the code. If the consequences of a failure are expensive, it could be very attractive to use code contracts. On the other hand, code contracts have a substantial cost in terms of developer's time (ensuring all contracts work well on a all but tiny projects requires a lot of time), and may not be a good idea for the projects which don't necessarily need this level of reliability.

This also answers your other question:

I'm wondering: just how far is it practical to take this?

This is a strictness vs. flexibility question.

If you need to develop very fast, and accept the risk of having runtime errors, you'll pick a weakly-typed language. The benefit of writing less code, for example print(123) outweighs the risk of problems which may be more or less difficult to debug, for example 123 + "4" (would it result in 127 or "1234"?)

In this case, types either won't exist or will be managed implicitly by the language/framework. While you could still do range validation (for instance to sanitize user input), it would look weird to do such validation outside the interfaces with the outside world.
If you are working on life-critical software, you'll use formal proof which will take a huge amount of time, but will make sure there are no errors in the code.

In this case, chances are types will have strict validation: range and precision for numeric values, length and allowed characters for strings, etc.
In between, you'll pick the approach which corresponds the most to your needs. For most software, code contracts would be an overkill. But for most software, having basic checking within a type wrapper (such as Percentage) may be a good idea. Without going into extremes (as discussed in the link already provided above) of creating classes for everything, it could be a good compromise to have a few generic types such as Range<T> or LengthLimitedString which are not that difficult to implement in languages such as C# or Java.

OTHER TIPS

I used to write real-world software this way in Ada, for military aircraft. We used separate types not only for ranges, but for units and array indices. For example, a distance in meters was a different type than a distance in feet. The index into a 100-element array had a different type than an index into a 10-element array.

First, the good. Quite a few bugs were indeed caught at compile time. Runtime bugs were usually caught closer to the source. For example, an array index out of bounds would be caught at the point the index was first assigned, rather than when you try to dereference it. Also, there are some speed advantages, because the runtime doesn't have to check things that the type system checked statically.

There were a number of drawbacks as well:

Large volume of compile errors, often with weird messages. This is the side effect of catching bugs at compile time. You feel like you're always fighting the compiler. On the other hand, it often worked the first time when you fixed all the compiler errors.
A ton of type annotations.
Heavy use of generics. I was some two years into my C++ job when I first had to look up the syntax to create my own templated class. I did the Ada equivalent every day.
A robust coding standard and peer review process was needed in order to prevent programmers from taking the less safe shortcuts.
It was difficult to get new programmers up to speed who were used to looser types.

For something safety critical, it was definitely worth it. However, I think I would prefer a language with type inference and strong programmer control over implicits, like Scala, where you could get a lot of the safety with less typing.

Here's how it's done in haskell:

data Prime = Prime Int
primeFromInt :: Int -> Maybe Prime
primeFromInt a | isPrime a = Just (Prime a)
primeFromInt _ = Nothing

Note that it'll still allow claiming that an int is a prime:

Prime 10

Even though 10 is not prime, compiler simply cannot detect this problem. Instead if you write it another way

primeFromInt 10

This will nicely reject the 10 and give us nothing.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange