Question

I have a class that runs as a service and returns html code of a website when supplied with a URL.

The code:

public interface IHtmlDownloader
{
    IWebProxy Proxy { get; set; }
    string UserAgent { get; set; }

    string GetHtml(string url)
}

public class HtmlDownloader : IHtmlDownloader
{
    WebClient _client;

    public IWebProxy Proxy
    {
        get { return _client.Proxy; }
        set { _client.Proxy = value; }
    }

    public string UserAgent
    {
        get
        { return _client.Headers[HttpRequestHeader.UserAgent].ToString(); }
        set
        { _client.Headers[HttpRequestHeader.UserAgent] = value; }
    }

    public HtmlDownloder()
    {
        _client = new WebClient();
    }

    public string GetHtml(string url)
    {
        this.CheckValidity(url);

        var htmlOfWebsite = _client.DownloadString(url);

        return htmlOfWebsite;
    }
}

Now I am not sure how exactly I should validate the URL. I know that it should not be null or whitespace or anything else, but should I validate that it is indeed a properly built URL?

Doing so would result in a method which would perform the validation. I could simply create a public static method inside HtmlDownloader so that the caller can validate the URL, or maybe even not a static one (GetUrl(string url) is after all called after the object has been instantiated), but it would be wrong for at least two reasons:

  1. It would be a validator applicable in so many places, and it should be available to other classes.
  2. I can't contract a static method with an interface.

Furthermore, it shouldn't be made static, but rather a service, so I could unit test HtmlDownloader (service would be injected through the constructor).

All of those ideas are quite dreadful and I am starting to think, that I shouldn't validate anything, but simply create a try/catch block that would catch the proper exception and then throw a UrlNotFoundException with the original exception supplied through InnerException.

Still, it would be nice to check for the validity of the URL. But to whom and how? Should the caller be injected with the proper service? But then, what if the URL is generated higher than the caller and the caller simply passes it on to the HtmlDownloader? Where should I validate the URL? Or is the simple fact that the URL have to passed through a few levels a code smell?

Was it helpful?

Solution

In my opinion the service should validate the URL so that it can return an error describing the misuse. You can't rely on the caller running validation and ensuring it passes you valid parameters. I imagine that's probably taken as read and you weren't suggesting that the thing calling the service defines the validation.

Moving on, validation of generic properties should be handled with a generic mechanism.

Assuming first of all that you're implementing a small number of services:

If your language provides you with the construct then use it directly in this code.

If it's not available I'd say implement one that can be called statically and call that instead - I.E. treat it as if it was a standard language construct.

It's simple and straightforward and only starts to cause a problem if you have a lot of services and a lot of duplication.

In terms of TDD:

Let's say you implemented the static method UrlValidator::validate - either it's a full blown implementation or it wraps up the available language construct to make it nicer to use. That's not important - it's just a static method.

This can be tested exhaustively in its own unit test. You can throw a thousand test cases at it to make sure it behaves exactly as you require.

Then, when you are testing the service HtmlDownloader you then only need to test the validation enough to convince yourself that you are correctly using the UrlValidator. For example, ensure an obviously valid email address passes, an obviously invalid email address fails.

This is kind of similar to what you would do with Dependency Injection except you have an assumption - you test enough to convince yourself that your assumption is correct. With DI you directly test that the assumption is correct.

And fair enough, DI might make this a little easier to test and the test more robust, but it seems like overkill for such a simple issue - particularly if you are only implementing a small number of services.

In my opinion it should be HtmlDownloaders responsiblity to define that it needs a valid URL, and if that's the case it's not just the case of injecting a UrlValidator, but of injecting something that can validate whatever you ask it to.

Which leads me onto to the "if you're implementing a lot of services" case.

In this situation it is probably worthwhile implementing a generic parameter validation package. Something that you can inject into HtmlDownloader, so that it can pass in a configuration and the data received and get back a validation state.

E.g. you might pass in something like (pseudo-code):

{ url = { required = true
        , dataType = ValidationTypes::URL }
}

The implementation of the validation lives outside the service, but the service cleanly defines what needs to be validated, and what the requirements are.

This massively reduces the testing required because you now only need to test that the configuration is correct, rather than testing the effect of it.

I like implementations like this, it's just that it's sometimes overkill for the kind of solution you need.

It a question of scale and required flexibility.

OTHER TIPS

As much as possible try and make your classes as independent as possible during unit testing, keeping cross referencing of functions down to a minimum will help in debugging, after all unit testing is done you could then create standalone utility functions that can handle operations that will be cross referenced by your classes...

Licensed under: CC-BY-SA with attribution
scroll top