GUI functional acceptance tests, making them less brittle / obstacle to further development

https://softwareengineering.stackexchange.com/questions/291684

09-10-2020
|

Question

(Background - skip to bold statement for the crux)

I'm working through the book "Growing Object-Oriented Software, Guided by Tests" by Freeman & Pryce, while applying it to a project I'm starting. My app is a web app in PHP that manages a user's eBay interactions.

I'm using Behat/Mink to write the end-to-end tests. I'm stuck on the very first test.

In the book, they stress the importance of first writing tests that then guide the writing of the code to make them pass. And that the tests should not get in the way of development, including changes to requirements, refactoring, and so on.

They stress that the first test should exercise the system end-to-end; e.g. commanding the user interface, using a database, to interacting with an external system such as eBay; and that this feature should be minimal.

But this means that the GUI for the first test is unlikely to remain as-is for future tests, and therefore the first (and subsequent) tests should be as flexible as possible so that inevitable changes to the GUI won't break the tests (which would have at least two major side effects: time re-writing the tests, AND risking that those re-written tests fail to catch bugs that the originals would have caught). The latter is what really worries me - In the book "The Pragmatic Programmer" they emphasize "Find bugs once".

So that's the abstract problem. Applying it to my project, I was thinking of first writing a slice of the system that accumulates records of purchases made from the user. This would involve setting up notifications on eBay that make a call to my application each time a customer commits to buy something from the user. The functional test would simulate that, and then check via the GUI that the order appears in the list (which I'd implement initially as a simple, single HTML table) visible to the user.

But once the customer commits to buying, they then need to pay for it via Paypal. The records that the app handles then become complex, since some users may never pay at all, some will pay, some will have full refunds, partial refunds, multiple refunds, messages may be exchanged between the user and customer, the user might want to print postage labels from the interface, etc etc.

So in all likelyhood the GUI will become unrecognisable from that written to satisfy the first test feature.

How can these two problem best be dealt with?

Time involved in adapting the GUI test code
Risk of re-written old tests failing to catch bugs for an old feature that previous tests would have caught

I've come up with some ideas, but it feels like a pandora's box that has overwhelmed me, and I'm wondering if anyone can point me in the right direction, possibly with reference to other books etc:

First idea: write the Behat tests (Gherkin) which would remain unchanged, but modify the Mink code that interpret those statements each time the GUI is changed. This solves very few problems, but at least the Gherkin statements would likely need no modifications.
(Extremely vague idea) Make the GUI layer as thin as possible, possibly multiple wafer-thin layers of abstractions just for the GUI, with matching abstractions in the tests, some how in such a way that would prevent changes in the GUI from affecting much of the tests. (Honestly, I don't have a clue how this would work).
Freeman/Pryce give a hint: "Make changes in the smallest possible steps". This makes it easier to correlate changes in the requirements/GUI design with changes in the tests. But ultimately as the changes in the GUI accumulate, I don't see how the risk of ineffective tests wouldn't also accumulate.

Apologies for the long question.

Solution

Don't write GUI tests to match the GUI. Write acceptance tests that match requirements

You could write your tests like this:

Given I enter "user" into the username field and "pass" into the password field
And I click the "Login" button
When I click the Find User button
And I enter "foo" into the username field
And I click "Search"
Then I should see a table with a user "Johnny Foo" in row "1"
And I should see a table with a user "Manny Foo" in row "2"

Or you could write them like this:

Given I am logged in as "user" with password "pass"
And I look for users called "foo"
Then the searchresults should show "Johnny Foo" on row "1"
And the searchresults should show "Manny Foo" on row "2"

The first one completely couples you to the GUI. The second one only couples you to what you need in terms of functionality, stories and requirements.

This way, you write code that becomes re-usable. The lines in your tests are functional expressions and there is code that matches these to the concrete GUI-implementation, but if something changes in your GUI you only have to make a change in one location, which is the place where you map things like "I look for users called 'foo'" to concrete GUI code.

Case-in-point: yesterday I switched a page for an application I work on from being loaded with a submit to fetching and displaying data using JSON and Knockout. That means that, instead of clicking a button and waiting for a submit I was now displaying a loading-animation which goes away when the data is loaded. Try this with the example above: in the first example you'll need to add that wait as a step, in the second you just tag that along with the "I look for users called 'foo'"-implementation. Now if you have three tests that look like my first example, no sweat. But what if you have ten? Or a hundred?

By the way, if you look at the examples for Behat/Mink and the Gherkin DSL it uses, you'll see these are very general, human-readable examples. This is not by accident, it serves as a very good example of how you should think when writing tests at this level.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange