Question

I have some research code that's a real rat's nest, with code duplication everywhere, and clearly needs to be refactored. However, the code base is evolving as I come up with new variations on the theme and fit them into the codebase. The reason I've put off refactoring so long is because I feel like the minute I spend a few days coming up with good abstractions, seeing what design patterns fit where, etc., I'll want to try out some new unforeseen idea that makes my abstractions completely inadequate. In other words, because of the rate at which the code is evolving, I really have no idea where abstraction lines belong, even though there is no shortage of (approximate) duplication and the general messiness of the code makes adding stuff to it a real pain. What are some general best practices for coping with this kind of situation?

Was it helpful?

Solution

Don't spend so long refactoring!

When you're about make a change in a piece of code, consider refactoring it to make the change easier.

After making the change, refactor again to clean up the damage done by that change.

In both cases, make the refactorings small and do them quickly, and move on.

You don't have to keep your code pristine at all times, but remember that it's easier to go fast if you have well-factored code to work in (and if you have good unit tests, of course).

OTHER TIPS

Test Driven Development:

Red, Green, Refactor. Rinse, repeat.

Since it's one of the steps in every single cycle, you'll notice that's a LOT of usually minor refactoring taking place. That's the way it should be.

Your situation is pretty familiar to me. While doing investigative coding often you have no idea what the "right" abstraction will be, and as you say it can change with every new idea.
Other posters have suggested:

  • Continuous small refactoring, which helps to avoid getting into the rats-nest situation
  • Test-Driven Development, which helps to find good, re-usable abstractions. It's important to note that TDD is less about testing than about doing good designs!

However, for investigative research code there is another strategy: the prototype. This seems to be what you are currently doing: coding as quickly as possible to prove a concept. There's nothing wrong with that, but a prototype should always be throw-away. Tweak it until you have all the necessary input and knowledge, then throw away the code and start over with TDD and continuous refactoring, and all your other "doing the things right" strategies.

Don't keep any of the code. Don't copy-paste anything. Don't refer back to it. Just start over with your new knowledge.

Clean up the code a little bit at a time. Always when you touch a class, try to leave the class cleaner that it was before you touched it ("the boy scout rule"). Refactoring is best done in very small steps, but very often.

Things like renaming some variable, splitting a method etc. take only some seconds or minutes. Large refactorings such as splitting or joining classes, may take an hour or two (and you make it in small steps, so that all tests pass at least every five minutes - otherwise you have entered Refactoring Hell and you should revert to the last known working state). If it takes days or weeks for you to refactor something, then it's not anymore "refactoring" - it's more like rewriting.

An article about this topic: http://blog.objectmentor.com/articles/2007/07/20/whats-your-unit-of-measure

Put it in Distributed SCM like Git at least, that way when you break something refactoring you can reverse time divisibly to find the commit prior to the change, as well as being able to work on changes and commit them in branches without interfering with others work.

Gits Branch merge is great for things like this and you'll know easily if 2 people made incompatible changes in parallel without having to worry about the rest of the code.

For the above reasons, I would also create a seperate branch in the repository just for re factoring code with, and keep it up-dated regularly. This way, not only will others not interfere with your progress, but they can keep an eye on it and see changes in it that will eventually hit the main branch so they can pre-emptively code around those changes.

If you already know where there is duplication, you don't need several days to refactor it away.

Sometimes a rewrite is the only choice. This seems to be the case.

The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages.

When it shows a parameterized abstraction of a set of found clones, it is essentially proposing that you refactor the code with that abstraction implemented (as a method, a function, a class, ...).

So running the CloneDR gets a list of potential abstractions to be added to your code, and replacing the clone instances by calls on the abstraction refactors your code thus cleaning it up (somewhat).

Even more remarkably, when it shows the parameter bindings used at each clone site needed to invoke the abstraction, it often shows a bungled clone instance, easily recognized when the bound paramters are conceptually inconsistent. If a parameer is bound to variables named YYYY-MM-DD, and one of them is YY-MM-DD, the "its a 4 digit-year" parameter type looks violated and in this this case there's a broken Y2K remediation. So examining the clone bindings often finds bugs.

This is a very common problem in scientific computing. Some of the most effective ideas for reducing the size and complexity of code require leveraging assumptions, and science demands that you constantly change those assumptions.

All you can do is try to refactor your code as you go, and try not to write yourself into any corners. Also work with good people who understand the value of not making a mess.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top