TDD like approach to Algorithmic problems

https://softwareengineering.stackexchange.com/questions/339855

05-01-2021
|

Question

I one failed at an algorithmic test with Codility because I tried to find a better solution, and in the end I had nothing.

So it made me think if I could use an approach similar to TDD? I.e. If I can usually develop a solution gradually in similar fashion?

If I were writing a sorting algorithm I could move from a standard Bubblesort to a 2-way bubblesort, but then something more advanced like Quicksort would be a "quantum leap", but at least I would have testdata I can easily validate.

Other tips for such tests? One thing I would do the next time is to use more methods/functions than inner loops. For example, in sorting, you often need a swap. If it were a method I would just need to modify the calling code. I could even have more advanced solutions as derived classes.

With "Algorithmic" vs "normal" problems I mean problems where Time-complexity is important. So instead of passing more tests as in TDD, you would make it "behave better".

With "similar to TDD" I mean:

Write relatively automatic test to save time on manual tests pr increment.
Incremental development.
Regression testing, ability to detect if code breaks or at least if functionality changes between increments.

I think this should be quie easy to understand if you compare

Writing a shell-sort directly
Jumping from bubblesort to quicksort (Total rewrite)
Moving incrementally from a one-way bubble-sort to shell sort (If possible).

Solution

See also Ron Jeffries's attempt to create a Sudoku solver with TDD, which unfortunately didn't work.

Algorithm requires a significant understanding of algorithm design principles. With these principles it is indeed possible to proceed incrementally, with a plan, like Peter Norvig did.

In fact, for algorithms requiring non-trivial design effort, it is almost always that the effort is incremental in nature. But each "increment", which is tiny in the eyes of an algorithm designer, looks like a quantum leap (to borrow your phrase) to a person who hasn't had the same expertise or knowledge with this particular family of algorithms.

This is why a basic education in CS theory combined with lots of algorithm programming practice are equally important. Knowing that a particular "technique" (small building blocks of algorithms) exists is a long way toward making these incremental quantum leaps.

There are some important differences between incremental progress in algorithms and TDD, though.

One of the difference has been mentioned by JeffO: A test that verifies the correctness of the output data is separate from a test that asserts the performance between different implementation of the same algorithm (or different algorithms vying to give the same solution).

In TDD, one adds a new requirement in the form of a test, and this test shall initially not pass (red). Then the requirement is satisfied (green). Finally the code is refactored.

In algorithm development, the requirement usually doesn't change. The result correctness verification test is either written first, or shortly after a draft (highly confident but slow) implementation of the algorithm is completed. This data correctness test is seldom changed; one does not change it to fail (red) as part of the TDD rite.

However, in this aspect, data analysis is distinctly different from algorithm development, because data analysis requirements (both the input sets and the expected outcomes) are only defined loosely in human understanding. Thus the requirements change frequently on a technical level. This rapid change puts data analysis somewhere between algorithm development and general software application development - while still algorithm-heavy, the requirements are also changing "too fast" to the taste of any programmer.

If the requirement changes, it typically calls for a different algorithm.

In algorithm development, changing (tightening) the performance comparison test to fail (red) is silly - it does not give you any insight about potential changes to your algorithm that would improve performance.

Therefore, in algorithm development, both the correctness test and the performance test are not TDD tests. Instead, both are regression tests. Specifically, the correctness regression test prevents you from making changes to the algorithm that will break its correctness; the performance test prevents you from making changes to the algorithm that will make it run slower.

You can still incorporate TDD as a personal working style, except that the "red - green - refactor" ritual is not strictly necessary in nor particularly beneficial to the thought process of algorithm development.

I would argue that algorithm improvements actually result from making random (not necessary correct) permutations to the data flow diagrams of the current algorithm, or mixing and matching them between previously known implementations.

TDD is used when there are multiple requirements that can be added incrementally to your test set.

Alternatively, if your algorithm is data-driven, each piece of test data / test case can be added incrementally. TDD would also be useful. For this reason a "TDD-like" approach of "add new test data - improve code to make it handle this data correctly - refactor" will also work for open-ended data analytics work, in which the objectives of algorithms are described in human-centric words and its measure-of-success also judged in human defined terms.

It purports to teach a way to make it less overwhelming than trying to satisfy all (dozens or hundreds) of requirements in a single attempt. In other words, TDD is enabled when you can dictate that certain requirements or stretch-goals can be temporarily ignored while you are implementing some early drafts of your solution.

TDD isn't a substitute for computer science. It is a psychological crutch that helps programmers overcome the shock of having to satisfy many requirements at once.

But if you already have one implementation that gives correct result, TDD would consider its goal accomplished and the code ready to be handed off (to refactoring, or to another programmer-user). In some sense, it encourages you not to prematurely optimize your code, by objectively giving you a signal that the code is "good enough" (to pass all correctness tests).

In TDD, there is a focus on "micro-requirements" (or hidden qualities) as well. For example, parameter validations, assertions, exception throwing and handling, etc. TDD helps assure the correctness of execution paths that aren't frequently exercised in the normal course of software execution.

Certain types of algorithm code contain these things as well; these are amenable to TDD. But because the general workflow of algorithm isn't TDD, such tests (on parameter validations, assertions, and exception throwing and handling) tend to be written after the implementation code has already been (at least partially) written.

OTHER TIPS

For your problem, you would have two tests:

A test to make sure the algorithm is still accurate. e.g. Is it sorted correctly?
Performance comparison test - has the performance improved. This can get tricky, so it helps to run the test on the same machine, same data and limit the use of any other resources. A dedicated machine helps.

What to test or full test coverage is debatable, but I think the complex parts of your application that need to be fine-tuned (get changed a lot) are perfect candidates for automated testing. These parts of the application are usually identified very early. Using a TDD approach with this piece would make sense.

Being able to solve complex problems shouldn't be hindered by a reluctance to try various approaches. Having an automated test should help in this area. At least you'll know you're not making it worse.

So instead of passing more tests as in TDD, you would make it "behave better".

Sort of.

You can test for run time and complexity. Runtime tests will need to be a little forgiving to allow for contention on system resources. But, in many languages, you can also override or inject methods and count actual function calls. And, if you're confident your current algorithm is suboptimal, you can introduce a test that simply demands that sort() invoke compare() fewer times than the naive implementation.

Or, you can compare sort() on two sets and ensure they scale in compare() calls by your target complexity (or thereabouts, if you expect some inconsistency).

And, if you can theoretically prove that a set of size N should require strictly no more than N*log(N) comparisons, it might be reasonable to restrict your already-working sort() to N*log(N) invocations of compare() ...

However ...

Meeting a performance or complexity requirement doesn't ensure that the underlying implementation is {AlgorithmX}. And, I'd argue that this is normally OK. It shouldn't matter what algorithm is used, provided you hit your implementation meets your requirements, including any important complexity, performance, or resource requirements.

But, if you want to ensure a particular algorithm is used, you'll need to be more tedious and get much more in depth. Things like..

Ensuring that exactly the expected number of calls to compare() and swap() (or whatever) are made
Wrapping all major functions/methods and ensuring, with a predictable data set, that the calls occur in exactly the expected order
Examining working state after each of N steps to ensure it changed exactly as expected

But again, if you're trying to ensure that {AlgorithmX} is used in particular, there are probably features of {AlgorithmX} you care about that are more important to test than whether {AlgorithmX} was actually used in the end ...

Also bear in mind, TDD doesn't negate the necessity of researching, thinking, or planning in software development. It also doesn't negate the need for brainstorming and experimentation, even if you can't easily assert on your Googling and whiteboarding in your automated test suite.

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange