I'm writing some code that I've decoupled off into a module of its own, and even though I'm most likely the only person who will use it, I'm trying to think as if I might not be. The functions in this module perform operations on objects in an array that's passed in, and right now, I'm doing a bunch of safety checks to avoid exceptions caused by getting bad data – specifically, accounting for the possibility of an undefined/null value for that array, or one of the objects in the array, or one of the properties on one of the objects. In other words, quite a lot of checking. However, in the application where this module is used, I've already checked for all of these things before sending the data to be processed by this module – because null values in this data will cause problems elsewhere, as well.

Now that everything is working smoothly, I'm tempted to remove some of the safety checks in my module for the sake of efficiency, since I'm duplicating work. However, if this module were to be used by someone else, I would think that it'd be better to make it as bulletproof as possible – so if bad data is passed in from their end, my module will be able to handle it without being responsible for any crashes. But then I thought that a hypothetical person using my module might be in the same position I'm in – confident the data they're passing in is good, because they've already checked it.

To me, the obvious solution is to have two versions of certain key functions in the module: a 'careful' version if you don't trust the data, and a 'dangerous' version if you do trust it, and just want to maximise performance. Is this something that's done? Assuming good documentation, would it be a good idea? And if it were a good idea, would it be better to differentiate between the two on a per-function basis, e.g. processSafely(data) and liveDangerously(data)) – or via seperate namespaces within the module – myModule.safe.process(data) and myModule.reckless.process(data)?

EDIT: The answers so far have been valuable, but I thought I should add (without veering too far into Stack Overflow territory) that the specific module I'm talking about is designed to accept a (potentially large) array of blog articles as objects (as they would come from a parser of a fairly standard format), extract tags out of said articles and count them, and return an array containing each individual tag with its count. In other words, as with all tasks involving parsing of files/objects supposedly conforming to a given format, there's sort of a lot that can go wrong, but I (or the user) will probably have to account for that elsewhere anyway (e.g. before rendering the articles to a view or whatever). Also, all of the checks I'm talking about are O(n) – not just one or two preliminary if statements or type coercions.

有帮助吗?

解决方案

In my experience, it is really valuable to have a function check all its preconditions even when you “know” everything will be all right. Everything that we let the computer do for us, we can't mess up and introduce accidental mistakes. There are some techniques how this can be implemented efficiently (e.g. as assertions that can be turned off for production builds), but I feel those tend to trade a substantial amount of safety for often little gain.

Instead, we can try to build water-tight abstractions where a constraint is checked once at runtime, and the type system of the language proves the constraint will never be invalidated.

As an example, I might have a function that operates on email addresses: sendEmail(String to, String subject, String plainTextMessage). This function will now have to check that the String to is a legal email address. This check will be repeated unnecessarily when you send multiple emails to the same address. Instead, we can define a class EmailAddress that checks in the constructor EmailAddress(String maybeAddress) whether the string is a legal address. If not, it throws. If so, the object is initialized with that address. This check is done once at construction, afterwards you can use it without additional checks.

However, we will need external access to the underlying string. It is important that this access is read-only. If the underlying type is mutable or if your language does not support const references, it can make sense to proxy the safe methods to the object rather than exposing it. After doing this, we might end up with sendEmail(EmailAddress to, EmailSubject subject, String body).

This technique is especially applicable in statically typed languages with good encapsulation mechanisms where defining a new type is easy to do. I routinely use this technique in C++ and occasionally in Java. In dynamic languages, it is less useful since there are no static type checks. However, the consuming function now only has to do a runtime type check, which is usually less expensive than the full precondition check.

其他提示

When possible, the best way to handle this is to encode your preconditions in the type system. Then ensure that the only way to have an instance of your precondition-asserting type is via a function which forces the caller to handle failure, such as one returning an Option/Optional/Maybe type.

The benefit to this is that code which violates your method's preconditions cannot compile.

For example, in scala,

def toPosInt(n: Int): Option[Int] = if (n > 0) Some(n) else None
// Find the square root of a positive number
def sqrt(n: PosInt): Float = ???

It's not possible to call this sqrt with a negative number because you can't get a PosInt for a negative number. Java 8 has a similar Optional type, and so on.

There are a few possibilities:

1) Soldier on, parse every thing you can and never complain. Ignore all those 404s, bad HTML, and nulls. This is a terrible option. Lots of work for you, and the client has no idea that the results may not be what they think they should be. If, for example, blogs plugging Bernie have more errors than blogs praising Hillary, your results will be biased.

A great quote from a strong critique of PHP on why you shouldn't soldier on" "When faced with either doing something nonsensical or aborting with an error, it will do something nonsensical"

2) Throw a useful exception on the first bad input. Easier to code, more useful to the client. But still annoying - hard for them to fix one at a time.

3) Soldier on, but ignore "bad blogs". However, keep a List of all the bad blogs (and ideally a description, e.g. " link has a 404") and return that to the client. They get feedback that there were errors and can decide if their results are "good enough". Also something you can unit test by including some known bad inputs.

许可以下: CC-BY-SA归因
scroll top