How to manage special cases and heuristics

https://stackoverflow.com/questions/958003

12-09-2019
|

Question

I often have code based on a specific well defined algorithm. This gets well commented and seems proper. For most data sets, the algorithm works great.

But then the edge cases, the special cases, the heuristics get added to solve particular problems with particular sets of data. As number of special cases grow, the comments get more and more hazy. I fear going back and looking at this code in a year or so and trying to remember why each particular special case or heuristic was added.

I sometimes wish there was a way to embed or link graphics in the source code, so I could say effectively, "in the graph of this data set, this particular feature here was causing the routine to trigger incorrectly, so that's why this piece of code was added".

What are some best-practices to handle situations like this?

Special cases seem to be always required to handle these unusual/edge cases. How can they be managed to keep the code relatively readable and understandable?

Consider an example dealing with feature recognition from photos (not exactly what I'm working on, but the analogy seems apt). When I find a particular picture for which the general algorithm fails and a special case is needed, I record as best I can that information in a comment, (or as someone suggested below, a descriptive function name). But what is often missing is a permanent link to the particular data file that exhibits the behavior in question. While my comment should describe the issue, and would probably say "see file foo.jp for an example of this behavior", this file is never in the source tree, and can easily get lost.

In cases like this, do people add data files to the source tree for reference?

Solution

If you have a knowledge base or a wiki for the project, you could add the graph in it, linking to it in the method as per Matthew's Fowler quote and also in the source control commit message for the edge case change.

//See description at KB#2312
private object SolveXAndYEdgeCase(object param)
{
   //modify param to solve for edge case
   return param;
}

Commit Message: Solution for X and Y edge case, see description at KB#2312

It is more work, but a way to document cases more thoroughly than mere test cases or comments could. Even though one might argue that test cases should be documentation enough, you might not want store the whole failing data set in it, for instance.

Remember, vague problems lead to vague solutions.

OTHER TIPS

Martin Fowler said in his refactoring book that when you feel the need to add a comment to your code, first see if you can encapsulate that code into a method and give the method a name that would replace the comment.

so as an abstract you could create a method named.

private bool ConditionXAndYHaveOccurred(object param)
{
   // code to check for conditions x and y
   return result;
}

private object ApplySolutionForEdgeCaseWhenXAndYHappen(object param)
{
   //modify param to solve for edge case
   return param;
}

Then you can write code like

if(ConditionXAndYHaveOccurred(myObject))
{
    myObject = ApplySolutionForEdgeCaseWhenXAndYHappen(myObject);
}

Not a hard and fast concrete example, but it would help with readability in a year or two.

Unit testing can help here. Having tests that actually simulate the special cases can often serve as documentation on why the code does what it does. This can often be better then just describing the issue in a comment.

Not that this replaces moving the special case handling to their own functions and decent comments...

I'm not usually an advocate of test driven development and similar styles that stress tests too much, but this seems to be a perfect case where a bunch of unit test can help a lot. And not even in the first place to catch bugs from later changes, but simply to document all the special cases that need to be addressed.

A few good unit test with comments in them are in itself the best description of the special cases. And the commenting of the code itself gets easier too. One can simply point to some unit tests that illustrate the problem that is being solved at that point in the code.

About the

I sometimes wish there was a way to embed or link graphics in the source code, so I could say effectively, "in the graph of this data set, this particular feature here was causing the routine to trigger incorrectly, so that's why this piece of code was added".

part:

If the "graphic" that you want to embed is a graph, and if you use Doxygen, you can embed dot commands in your comment to generate a graph in the documentation:

/**
If we have a subgraph looking like this:
\dot
digraph g{
A->B;
A->C;
B->C;
}
\enddot
the usual method does not work well and we use this heuristic instead.
*/

Don Knuth invented literate programming to make it easy for your program documentation to include plots, graphs, charts, mathematical equations, and whatever else you need to make it understood. A literate program is a great way to explain why something is the way it is and how it got that way over time. There are many, many literate-programming tools; the "noweb" tool is one of the simplest and is shipped with some Linux distributions.

Without knowing the specific nature of your problem is not easy to give an answer, but in my own experience, handling of special cases on hard code must be avoided. Haven't you thought about implementing a rules engine or something like that for handling special cases outside your main processing algorithm?

It sounds like you need more thorough documentation than just code comments. That way someone could look up the function in question in the documentation and be presented with an example picture that requires a special case.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow