Should I validate API output?

https://softwareengineering.stackexchange.com/questions/278698

web-api

08-10-2020
|

سؤال

I'm working on a Web API to provide data to a third party per the specification they provided.

The process for each API call is essentially:
1. Extract data as XML
2. Deserialize data to DTO (POCO)
3. Return DTO as response content (which is then serialized to JSON via Web API 2 content negotiation)

I've been asked by someone on our development team to validate the DTOs.

Aside from ensuring that the responses are in a format that the client can process, which was essentially already accomplished by creating the DTO classes, this seems like a waste of effort to me. I checked Google, but the only output validation anyone's talking about is sanitizing values like credit card numbers and SSNs. I can't remember ever seeing a method validate an object it just created before returning it.

Because the response JSON is really just the extracted XML after having been deserialized and re-serialized, the only way the DTO could be invalid would be if the extracted XML was invalid. So I'd basically be retesting the extractions on every call.

Except a few cases where the format is specified, I don't actually know what values are considered valid by the client. The best I can really do in most cases is make sure there aren't any blanks. I would essentially be attempting to blindly recreate the client's validation just so we could pre-validate the data before the client validated it anyway. Assuming I managed to get it right, the payoff would just be to shift the initial support burden to our team because our server would be throwing validation exceptions instead of their client.

The structure of the DTOs is fairly complicated. If I ignore the fact that they're being created through deserialization, the validation gets complicated fast. Most of the validation issues I would then have to check for (e.g., null array elements, blank values) aren't actually possible in the real implementation. XmlSerializer isn't going to create null array elements, and NOT NULL database fields aren't going to result in missing XML elements. Add 100% unit test coverage, and this is now adding significant effort and complexity.

Is output validation even a practice? I've never seen it done before. It feels overly cautious and preemptive. If it is a practice, is there another term for it that would help me to find more information on the subject?

المحلول

I would essentially be attempting to blindly recreate the client's validation just so we could pre-validate the data before the client validated it anyway.

Having to do it blindly should be a big red flag that you and your clients haven't come to an agreement about what constitutes valid data. If you don't know what valid data looks like, you can't test your code. If you can't test your code, you can't say it works with any level of confidence. Essentially, you're depending on your clients to find your bugs, which isn't good.

Is output validation even a practice?

Absolutely. Whether or not it's a common enough practice is open for debate. :-)

Output validation is part of something called design by contract (or DbC), which is a term coined by Bertrand Meyer when he designed the Eiffel programming language in the 1980s. One of the design principles surrounding DbC dictates that the first step in developing a unit of code is specifying what conditions must be satisfied on entry (preconditions) and exit (postconditions) for execution to be considered successful. These conditions, called the contract, are extremely powerful tools for ensuring that all parties understand what code is supposed to do and that it makes good on those promises. The link above contains a more thorough description of contracts and why you'd want them. Eiffel and a handful of other languages support DbC or something like it directly; many others do it with assertions or add-on packages.

What you're generating sounds complex enough that what you should be using one or more functions to generate each part. As you break the whole into smaller pieces, specifying what's correct for each piece and doing the verification becomes a set of smaller, simpler tasks that are less prone to mistakes.

For example, say part A of your output is composed of sub-parts B, C and D. If the functions that generate B, C and D can guarantee that their outputs are correct, validating A becomes a simple matter of checking that the other three actually produced something. If there's no validation in the generation of the sub-parts, it's up to A to check everything, and that can get very complex very quickly.

If this sounds like a duplication of effort already being done by the client, it shouldn't be. In an ideal world, the class that represents B would have features to ensure that inputs and outputs are valid and the implementation would be used at both ends of the transaction. When that's not possible, the best that can happen is that both sides do their own validation. That hurdle is usually administrative rather than technical. Either way, there's nothing wrong with extra validation: you're doing it to ensure that what you produce is correct and the client does its own version to detect your mistakes.

نصائح أخرى

This is an interesting question and I am in almost exactly the same situation at the moment.

I've reached the conclusion that it's actually quite necessary because it's entirely possible that you might rename a field in your DTO. What happens then? You fix up the rest of your code, make everything work, deploy your new web service, but now your customer complains that your XML is now incorrect. Everything works on your end but you didn't catch the error.

My web service returns JSON but it's the same principle. I have a unit test that instantiates the WebApi controller, calls the method and retrieves the result as JSON, and make sure that the result is correct. If you mock and stub your data you can ensure that the entire result returned is exactly as you expect. Remember that you're not testing the .NET WebApi serialisation code, you're not testing that the output values are correct, you're testing that the output of your web service is syntactically correct (XML fields are nested properly and named properly).

I would say with XML is just as likely (possibly more likely) that you might accidentally change the format without even realising it and break the service for your client(s). So yeah, go for it, validate away.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى softwareengineering.stackexchange