Be liberal in what you accept… or not?

https://softwareengineering.stackexchange.com/questions/12401

design

16-10-2019
|

Question

[Disclaimer: this question is subjective, but I would prefer getting answers backed by facts and/or reflexions]

I think everyone knows about the Robustness Principle, usually summed up by Postel's Law:

Be conservative in what you send; be liberal in what you accept.

I would agree that for the design of a widespread communication protocol this may make sense (with the goal of allowing easy extension), however I have always thought that its application to HTML / CSS was a total failure, each browser implementing its own silent tweak detection / behavior, making it near impossible to obtain a consistent rendering across multiple browsers.

I do notice though that there the RFC of the TCP protocol deems "Silent Failure" acceptable unless otherwise specified... which is an interesting behavior, to say the least.

There are other examples of the application of this principle throughout the software trade that regularly pop up because they have bitten developpers, from the top off my head:

Javascript semi-colon insertion
C (silent) builtin conversions (which would not be so bad if it did not truncated...)

and there are tools to help implement "smart" behavior:

name matching phonetic algorithms (Double Metaphone)
string distances algorithms (Levenshtein distance)

However I find that this approach, while it may be helpful when dealing with non-technical users or to help users in the process of error recovery, has some drawbacks when applied to the design of library/classes interface:

it is somewhat subjective whether the algorithm guesses "right", and thus it may go against the Principle of Least Astonishment
it makes the implementation more difficult, thus more chances to introduce bugs (violation of YAGNI ?)
it makes the behavior more susceptible to change, as any modification of the "guess" routine may break old programs, nearly excluding refactoring possibilities... from the start!

And this is what led me to the following question:

When designing an interface (library, class, message), do you lean toward the robustness principle or not ?

I myself tend to be quite strict, using extensive input validation on my interfaces, and I was wondering if I was perhaps too strict.

Solution

I would say robustness when it doesn't introduce ambiguities.

For example: When parsing a comma separated list, whether or not there's a space before/after the comma doesn't change the semantic meaning.

When parsing a string guid it should accept any number of the common formats (with or without dashes, with or without surrounding curly braces).

Most programming languages are robust with white space usage. Specifically everywhere that it doesn't affect the meaning of code. Even in Python where whitespace is relevant, it's still flexible when you're inside of a list or dictionary declaration.

I definitely agree that if something can be interpreted multiple ways or if it's not 100% clear what was meant then too much robustness can end up being a pain though, but there's much room for robustness without being ambiguous.

OTHER TIPS

Definitely not. Techniques such as defensive programming obscures bugs, making their appearance less likely and more random which makes their detection harder which makes isolating them more difficult.

The vastly under-rated Writing Solid Code was tremendous in repeatedly emphasizing the need, and techniques, of making bugs as difficult to introduce or hide. Through application of its principles such as, "Eliminate random behavior. Force bugs to be reproducible." and, "Always look for, and eliminate, flaws in your interfaces." developers will vastly improve the quality of their software by eliminating the ambiguity and uncontrolled side-effects that is responsible for a large quantity of bugs.

Overapplication of Robustness leads to you guessing what the user wanted, which is fine right up till you get it wrong. It also requires the completely misguided faith that your customers won't abuse your trust and create random gibberish which just happens to work, but that you won't be able to support in version 2.

Overapplication of Correctness leads to you denying your customers the right to make minor errors, which is fine right up until they complain that their stuff works fine on your competitor's product, and tell you what you can do with your 5,000 page standard that has the word "DRAFT" still scrawled on the cover in crayon, and at least 3 experts claim is fundamentally flawed, and 200 more honest experts say they don't fully understand.

My personal solution has always been deprecation. You support them, but tell them they're doing wrong, and (if possible) the easiest path to correctness. That way, when you turn the bug-feature off 10 years down the line, you at least have the paper trail to state that "we warned you this might happen."

Unfortunately the so called "robustness principle" doesn't lead to robustness. Take HTML as example. Much trouble, tears, waste of time and energy could have been avoided if browsers had strictly parsed HTML from the beginning instead of trying to guess the meaning of malformed content.

The browser should simply have displayed an error message instead of trying to fix it under the covers. That would have forced all the bunglers to fix their mess.

I divide interfaces into several groups (add more if you like):

those that are under your control should be strict (classes typically)
library APIs, that should be also on strict side, but extra validation is advised
public interfaces that must handle every kind of abuse that comes in (typically protocols, user inputs, etc). Here robustness on input really pays off, you can't expect everyone is going to fix their stuff. And remember for user it will be your fault if the application doesn't work, not the party who sent some ill-formatted crap.

Output must always be strict.

I think that HTML and the World Wide Web have provided a wide-scale real-world test of the Robustness Principle and shown it to be a massive failure. It's directly responsible for the confusing mess of competing HTML almost-standards that makes life miserable for Web developers (and their users) and gets worse with every new Internet Explorer release.

We've known since the 1950s how to validate code properly. Run it through a strict parser and if something isn't syntactically correct, throw an error and abort. Do not pass go, do not collect $200, and for the love of all that is binary do not let some computer program attempt to read the coder's mind if he made a mistake!

HTML and JavaScript have shown us exactly what happens when those principles are ignored. Best course of action is to learn from their mistakes and not repeat them.

As a counterpoint to Mason's example, my experience with the Session Initiation Protocol was that while different stacks would interpret the relevant RFCs differently (and I suspect this happens with every standard ever written), being (moderately) liberal in what you accept means that you can actually make calls between two devices. Because these devices are usual physical things as opposed to pieces of software on a desktop, you simply have to be liberal in what you accept, or your phone can't call another phone of a particular make. That doesn't make your phone look good!

But if you're writing a library, you probably don't have the problem of multiple parties interpreting a common standard in mutually incompatible ways. In that case, I'd say be strict in what you accept, because it removes ambiguities.

The Jargon File also has a horror story on "guessing" a user's intent.

You're right, the rule applies to protocols, and not programming. If you make a typo while programming, you'll get an error as soon as you compile (or run, if you're one of those dynamic types). There's nothing to be gained by letting the computer guess for you. Unlike the common folk, we are engineers and capable of saying exactly what me mean. ;)

So, when designing an API, I would say don't follow the Robustness Principle. If the developer makes a mistake, they should find out about it right away. Of course, if your API uses data from an outside source, like a file, you should be lenient. The user of your library should find out about his/her own mistakes, but not anyone else's.

As an aside, I would guess that "silent failure" is allowed in the TCP protocol because otherwise, if people were throwing malformed packets at you, you would be bombarded with error messages. That's simple DoS protection right there.

IMO, robustness is one side of a design trade-off not a "prefer" principle. As many have pointed out, nothing stinks like blowing four hours trying to figure out where your JS went wrong only to discover the real problem was only one browser did the proper thing with XHTML Strict. It let the page go to pieces when some portion of the served HTML was a complete disaster.

On the other hand, who wants to look up documentation for a method that takes 20 arguments and insists they be in the exact same order with empty or null value place holders for the ones you want to skip? The equally awful robust way to deal with that method would be to check every arg and try to guess which one was for what based on relative positions and types and then fail silently or try to "make do" with meaningless args.

Or you can bake flexibility into the process by passing an object literal/dictionary/key-value pair list and handle the existence of each arg as you get to it. For the very minor perf tradeoff, that's a cake and eat it too scenario.

Overloading args in intelligent and interface-consistent ways is a smart way to be robust about things. So is baking redundancy into a system where it's assumed packet delivery will regularly fail to be delivered in a massively complicated network owned and run by everybody in an emerging field of technology with a wide variety of potential means for transmission.

Tolerating abject failure, however, especially within a system you control, is never a good tradeoff. For instance, I had to take a breather to avoid throwing a hissy fit in another question about putting JS at the top or bottom of the page. Several people insisted that it was better to put JS at the top because then if the page failed to load completely, you would still potentially have some functionality. Half-working pages are worse than complete busts. At best, they result in more visitors to your site rightly assuming you're incompetent before you find out about it than if the busted up page is simply bounced to an error page upon failing it's own validation check followed by an automated e-mail to somebody who can do something about it. Would you feel comfortable handing your credit card info over to a site that was half-busted all the time?

Attempting to deliver 2010 functionality on a 1999 browser when you could just deliver a lower tech page is another example of a foolhardy design tradeoff. The opportunities blown and money I've seen wasted on developer time spent on bug-ridden workarounds just to get rounded corners on an element hovering above a !@#$ing gradient background for instance, have completely blown me away. And for what? To deliver higher tech pages that perform poorly to proven technophobes while limiting you choices on higher end browsers.

In order for it to be the right choice, the choice to handle input in a robust manner should always make life easier on both sides of the problem, in the short and the long term IMO.

Never fail silently. Apart from that, trying to guess on what the user of an API/library wanted, does not sound like a bad idea. I would not follow it though; having a strict requirement, can expose bugs in the calling code and/or misinterpretations about your API/library.

Furthermore, as has already been pointed out, it depends on how hard it is to actually guess what the user expected. If it's very easy, then you have two cases:

Your library should be designed a bit differently (rename some function or split it in two), so that the user can expect what you actually provide.
If you believe that your library is designed properly, meaning clear / straightforward naming, then you can try to infer what the user intended.

In any case that it is not 100% obvious and deterministic, that one input should be converted to another, you should not do the conversions, for a number of already mentioned reasons (breaking compatibility on refactoring, least astonishment of users).

When dealing with an end user, trying to fix their input/guess is very welcome. He is expected to enter invalid information; this case is completely unexceptional. Another developer though, is not a simple non technical user. He has the expertise to understand an error, and the error can have significance/be beneficial to him. Thus, I agree with you on designing strict APIs, while -of course- strictness is accompanied by clarity and simplicity.

I would recommend that you read this question of mine, of a similar case.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange