Refactoring numerical code for TDD and encapsulation

https://softwareengineering.stackexchange.com/questions/296470

10-10-2020
|

Question

I am coming to terms with TDD, and the fact that I need to re-factor some code that I'm (re)writing. I am having a problem that I think is a classic conflict---TDD vs. encapsulation of private methods/data---and I need advice on how to structure my code properly so that everything is testable through its public interface.

I am writing a class which does some heavy computation for a performance critical code. It contains ~10 std::valarrays which, in typical use, may have between 50 and 200 floats. All of the methods pound on some or all of those arrays---usually by zipping through them in loops. Additionally, there is a fist full of constants common to all of the methods.

Because these methods all use the same data, they really do belong in a class together. (In my first iteration of the code (in C) my function call signatures were horrifying but repetitive.)

The class really only needs one or two public methods which returns a comparatively small and simple array of POD structures. So, the input is simple, the output is simple but there is HEAVY computation on a lot of self generated data in between.

A simplified illustration might look something like this:

class HighPerformingDoohicky(){
 private:
  std::valarray a, b, c,
   ...
  float some_constants // all methods depend on these.

  void method1() // pounds on a, b, c.
  void method2() // pounds on c, d, e.
  void method3() // pounds on all of them    

 public:
  HighPerformingDoohicky(a_few_numbers) some_constants{a_few_numbers} : {}
  std::valarray<float> method4() // loop/iterate calls to methods 1, 2 and 3 to produce output
}

So there is a *LOT* of private data so any one datum is only a small part of the generated output. Testing the output methods really doesn't help me develop the numerics and it would be easy for subtle problems to go unnoticed in the public interface.

I am happy to restructure (friend classes maybe?) and I am not fixated on the solution having a particular form as long as it is easy to read, is testable, is correct and is efficient.

Addendum: In case someone finds it illuminating to know more specifically about the calculations here is a summary. Six of the variables are the state space for a dynamical system. One of the methods is an Euler integrator that use the equations of motion (and control) to project a state space trajectory based on initial conditions (input). Another method will compute a set of Lagrange multipliers based on that trajectory. Another method will compute a gradient vector that depends on the trajectory and the Lagrange multipliers. Method 3 will repeatedly call those functions to solve an optimization problem. All of these things (including the gradient) are valarray which are N elements long, with N somewhere between 20 and 200 in normal operation.

La solution

Six of the variables are the state space for a dynamical system. One of the methods is an Euler integrator that use the equations of motion (and control) to project a state space trajectory based on initial conditions (input). Another method will compute a set of Lagrange multipliers based on that trajectory. Another method will compute a gradient vector that depends on the trajectory and the Lagrange multipliers. Method 3 will repeatedly call those functions to solve an optimization problem. All of these things (including the gradient) are valarray which are N elements long, with N somewhere between 20 and 200 in normal operation.

I do not know the internals of your class, but this description sounds a lot like different responsibilites, which could be separated to different classes, each one with its own public interface. The fact some of them operate on the same data does not make it mandatory to put all of them into the same class - noone forbids you to design classes taking a reference to the data of another class if that helps you to get things in a better shape. The key to make this work is as always to get the different abstractions right (and in C++: by having a clear plan for ownership of the data).

However, if you really think you are not able to design this program in more separated fashion, may be for the sake of performance, then it may be a tradeoff between different design goals - performance vs. the "do not test private functions" mantra. Actually, I would not hesitate to question the latter in such a situation a bit. Sometimes adding "maintenance hatches" to your classes can help - public helper methods, which will allow you to access some internals/inner workings only for testing purposes.

Licencié sous: CC-BY-SA avec attribution

Non affilié à softwareengineering.stackexchange