Why is white box testing discouraged in OOP?

https://softwareengineering.stackexchange.com/questions/351140

14-01-2021
|

質問

It seems the general consensus for unit testing classes is to test your object through its public interface only. So if you wanted to test the removeElement method on a LinkedList class you'd need to call addElement, then removeElement, and lastly containsElement to assert the element was removed.

This is brittle because if addElement or containsElement broke, then the test will fail even if the implementation of removeElement is correct.

When I test standalone procedures I try to call them in isolation. If I were to test a removeElement procedure I would build up the state of the parameters directly in the test and then assert their state post call is correct.

The only difference between a method and a procedure is that a method is implicitly given the object as a parameter. So since list.removeElement(el) and removeElement(list, el) are functionally the same, why not test them the same way? e.g. In the test, create an instance of the LinkedList class, setup up its "private" fields, call removeElement, and assert its fields post-call changed correctly.

This is the ideal unit test because its about taking input and asserting output for a single unit of functionality. Having to call public methods A, B, C, D, E, and F just to test method G is a borderline integration test, can potentially create false positives (since the data itself is never validated), and makes isolating the failure of the test during maintenance more difficult.

Anecdotally I've found that black box testing tempts developers to add unnecessary public methods to make their testing "easier" but increases maintenance in the long run.

So my question is why is white box testing discouraged in the OO world when it seems like common sense in the procedural and functional worlds?

EDIT: Is there an OO way of dealing with the grips I've outlined in my post, specifically in the later half that do not involve adding new public methods and avoid calling public methods other than the one being tested? Consider the dilemma of asserting the "previous" node pointer in my comment.

EDIT #2: Apparently my concept of a class might be different from others. A class to me is just a (very old) design pattern: "construct", "consume" (e.g. call methods), and "destruct" which is no different than fopen, fread, fwrite fseek, and fclose in C. Regardless of whether there is an implicit parameter involved, things are stuffed behind a namespace, or you call it private, public, or protected everything is just data and data transformations at the end of the day. I'm having trouble grasping classes as a unit when it seems more like a design pattern or even "container" for the actual units which are the functions themselves.

解決

You are mixing up two related, but nevertheless different things:

white box testing
unit testing by using private methods

The reasons for writing or not writing unit test only using public methods have been discussed numerous times before on this site, for example here or here. I don't think it makes sense to repeat those arguments, if that is your question, you will probably find an answer following those links.

White box testing, however, does not mean to use private members for setting up a test. It means to design tests using specific knowledge about the internals of the tested class or component. For example, by creating tests to achieve full code coverage and/or branch coverage - and this is typically done by using just public members. So white box testing requires to know the internals of a class, but does not directly utilize access to the internals. This lets the designer of a component in a situation where he can still change the implementation details without worrying too much about the tests.

This kind of testing is not discouraged in OOP, quite the opposite. The well known Test Driven Development (which is popular in OOP as well as in non-OOP) is a form of testing which actually leads to these kind of tests: whenever one wants to add a new feature to a function, class of component, one writes a "red" test first, adds some new code or changes some existing code to add the feature, and since the new test now becomes "green", it is obvious the added or changed code must have been covered by the test.

To your example: if removeElement is a public method of a "list" module, and not a member of a class, I would still recommend the way of testing using only the public interface of that module, just as if it was a class. Your example of a broken addElement or containsElement is contrived (and your idea of "to avoid calling public methods other than the one being tested" is - no offence - misguided). In reality, one would design such a test by

creating a new list
assert the list does not contain element X
add an element X to the list
assert the list nows contain element X
remove the element X from the list
assert the list does not contain element X any more

which is all possible using public methods.

If addElement or containsElement were broken, the above test sequence makes sure the test wil reveal this (and does not give a false positive for removeElement).

Of course, there are cases of classes or complex components where using the public interface alone might not be the best approach to create a full test scenario, and where it can be helpful to loosen the encapsulation to some degree, for example, by adding "maintenance hatches" into the code. But I think such cases are exceptional cases, and good test and component design should try to avoid these situations. A simple component like a linked list should not require such measures.

他のヒント

I think whats going on here is a difference of perspective: Do you view classes as the smallest unit or functions? I'm going to deconstruct my question from the perspective of someone who views classes as the smallest unit:

[The] ideal unit test ... is about taking input and asserting output for a single unit of functionality. Having to call public methods A, B, C, D, E, and F just to test method G is a borderline integration test.

This makes no sense because the class is the smallest unit, therefore calling multiple methods isn't an issue because it tests the unit as a whole.

[It] can potentially create false positives (since the data itself is never validated).

Its not about data confidence its about the interface and if it appears to behave correctly.

Isolating the failure of the test during maintenance is more difficult.

If a test fails then it doesn't matter which method induced the failure because the unit as a whole is broken. Difficulty finding the root cause is accepted because the result fixes the unit as a whole.

If you wanted to test the removeElement method on a LinkedList class you'd need to call addElement, then removeElement, and lastly containsElement to assert the element was removed. This is brittle because if addElement or containsElement broke, then the test will fail even if the implementation of removeElement is correct.

Its not about a method failing, its about the class as a whole failing. Testing a specific method in isolation doesn't make sense. Its best to test the class as a whole through workflow or requirement tests.

Now I'll state the perspective of someone viewing functions as the smallest unit. I think I described it best in edit #2 (I've modified the wording slightly):

A class is a design pattern: "construct", "consume", and "destruct" which is no different than fopen, fread/fwrite/fseek, and fclose in C. Regardless of whether there is an implicit self parameter, definitions are placed in a namespace, or you call it private, public, or protected everything is data and data transformations at the end of the day.

From this perspective there is nothing wrong with establishing data inputs directly during testing. If you don't then you end up playing games with the function call sequence to nudge the data (which you weren't supposed to know about) into a state which you can indirectly verify; this sacrifices direct data confidence and test clarity.

There seems to be a divide regarding whether engineers should think in terms of data and data transformations or abstractions and at what level during development and testing. This raises the question as to why this divide exists. Its an important question because it has a ripple effect on how engineers view, design, and test software. I'm tempted to post a separate question because I believe its worth asking.

ライセンス： CC-BY-SA と帰属

所属していません softwareengineering.stackexchange