Question

I'm new to unit testing so I'd like to get the opinion of some who are a little more clued-in.

I need to write some screen-scraping code shortly. The target system is a web ui where there'll be copious HTML parsing and similar volatile goodness involved. I'll never be notified of any changes by the target system (e.g. they put a redesign on their site or otherwise change functionality). So I anticipate my code breaking regularly.

So I think my real question is, how much, if any, of my unit testing should worry about or deal with the interface (the website I'm scraping) changing?

I think unit tests or not, I'm going to need to test heavily at runtime since I need to ensure the data I'm consuming is pristine. Even if I ran unit tests prior to every run, the web UI could still change between tests and runtime.

So do I focus on in-code testing and exception handling? Does that mean to draw a line in the sand and exclude this kind of testing from unit tests altogether?

Thanks

Was it helpful?

Solution

Unit testing should always be designed to have repeatable known results.

Therefore, to unit test a screen-scraper, you should be writing the test against a known set of HTML (you may use a mock object to represent this)

The sort of thing you are talking about doesn't really sound like a scenario for unit-testing to me - if you want to ensure your code runs as robustly as possible, then it is more, as you say, about in-code testing and exception handling.

I would also include some alerting code, so they system made you aware of any occasions when the HTML does not get parsed as expected.

OTHER TIPS

You should try to separate your tests as much as possible. Test the data handling with low level tests that execute the actual code (i.e. not via a simulated browser).

In the simulated browser, just make sure that the right things happen when you click on buttons, when you submit forms, and when you follow links.

Never try to test whether the layout is correct.

I think the thing unit tests might be useful for here is if you have a build server they will give you an early warning the code no longer works. You can't write a unit test to prove that screenscraping will still work if the site changes its HTML (because you can't tell what they will change).

You might be able to write a unit test to check that something useful is returned from your efforts.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top