Question

I work in a medium sized team and I run into these painfully large class files on a regular basis. My first tendency is to go at them with a knife, but that usually just makes matters worse and puts me into a bad state of mind.

For example, imagine you were just given a windows service to work on. Now there is a bug in this service and you need to figure out what the service does before you can have any hope of fixing it. You open the service up and see that someone decided to just use one file for everything. Start method is in there, Stop method, Timers, all the handling and functionality. I am talking thousands of lines of code. Methods under a hundred lines of code are rare.

Now assuming you cannot rewrite the entire class and these god classes are just going to keep popping up, what is the best way to deal with them? Where do you start? What do you try to accomplish first? How do you deal with this kind of thing and not just want to get all stabby.

If you have some strategy just to keep your temper in check, that is welcome as well.

Tips Thus Far:

  1. Establish test coverage
  2. Code folding
  3. Reorganize existing methods
  4. Document behavior as discovered
  5. Aim for incremental improvement

Edit:

Charles Conway recommend a podcast which turned out to be very helpful. link

Michael Feathers (guy in the podcast) begins with the premise that were are too afraid to simply take a project out of source control and just play with it directly and then throw away the changes. I can say that I am guilty of this.

He essentially said to take the item you want to learn more about and just start pulling it apart. Discover it's dependencies and then break them. Follow it through everywhere it goes.

Great Tip Take the large class that is used elsewhere and have it implement an emtpy interface. Then take the code using the class and have it instantiate the interface instead. This will give you a complete list of all the dependencies to that large class in your code.

Was it helpful?

Solution

Ouch! Sounds like the place I use to work.

Take a look at Working effectivly with legacy code. It has some gems on how to deal with atrocious code.

DotNetRocks recently did a show on working with legacy code. There is no magic pill that is going to make it work.

The best advice I've heard is start incrementally wrapping the code in tests.

OTHER TIPS

That reminds me of my current job and when I first joined. They didn't let me re-write anything because I had the same argument, "These classes are so big and poorly written! no one could possibly understand them let alone add new functionality to them."

So the first thing I would do is to make sure there are comprehensive testing behind the areas that you're looking to change. And at least then you will have a chance of changing the code and not having (too many) arguments (hopefully). And by tests, I mean testing the components functionally with integration or acceptance tests and making sure it is 100% covered. If the tests are good, then you should be able to confidently change the code by splitting up the big class into smaller ones, getting rid of duplication etc etc

Even if you cannot refactor the file, try to reorganize it. Move methods/functions so that they are at least organized within the file logically. Then put in lots of comments explaining each section. No, you haven't rewritten the program, but at least now you can read it properly, and the next time you have to work on the file, you'll have lots of comments, written by you (which hopefully means that you will be able to understand them) which will help you deal with the program.

Code Folding can help. If you can move stuff around within the giant class and organize it in a somewhat logical way, then you can put folds around various blocks.

Hide everthing, and you're back to a C paradigm, except with folds rather than separate files.

I've come across this situation as well.

Personally I print out (yeah, it can be a lot of pages) the code first. Then I draw a box around sections of code that are not part of any "main-loop" or are just helper functions and make sure I understand these things first. The reason is they are probably referred to many times within the main body of the class and it's good to know what they do

Second, I identify the main algorithm(s) and decompose them into their parts using a numbering system that alternates between numbers and letters (it's ugly but works well for me). For example you could be looking at part of an algorithm 4 "levels" deep and the numbering would be 1.b.3.e or some other god awful thing. Note that when I say levels, I am not referring directly to control blocks or scope necessarily, but where I have identified steps and sub-steps of an algorithm.

Then it's a matter of just reading and re-reading the algorithm. When you start out it sounds like a lot of time, but I find that doing this develops a natural ability to comprehend a great deal of logic all at once. Also, if you discover an error attributed to this code, having visually broken it down on paper ahead of time helps you "navigate" the code later, since you have a sort of map of it in your head already.

If your bosses don't think you understand something until you have some form of UML describing it, a UML sequence diagram could help here if you pretend the sub-step levels are different "classes" represented horizontally, and start-to-finish is represented vertically from top-to-bottom.

I feel your pain. I tackled something like this once for a hobby project involving processing digital TV data on my computer. A fellow on a hardware forum had written an amazing tool for recording shows, seeing everything that was on, and more. Plus, he had done incredibly vital work of working around bugs in real broadcast signals that were in violation of the standard. He'd done amazing work with thread scheduling to be sure that no matter what, you wouldn't lose those real-time packets: on an old Pentium, he could record four streams simultaneously while also playing Doom and never lose a package. In short, this code incorporated a ton of great knowledge. I was hoping to take some pieces and incorporate them into my own project.

I got the source code. One file, 22,000 lines of C, no abstraction. I spent hours reading it; there was all this great work, but it was all done badly. I was not able to reuse a single line or even a single idea.

I'm not sure what the moral of the story is, but if I had been forced to use this stuff at work, I would have begged permission to chip pieces off it one at a time, build unit tests for each piece, and eventually grow a new, sensible thing out of the pieces. This approach is a bit different than trying to refactor and maintain a large brick in place, but I would rather have left the legacy code untouched and tried to bring up a new system in parallel.

The first thing I would do is write some unit tests to box the current behavior, assuming that there are none already. Then I'd start in the area where I need to make the change and try to get that method cleaned up -- i.e. refactor working code before introducing changes. Use common refactoring techniques to extract and reuse methods from existing long methods to make them more understandable. When you extract a method, look for other places in the code where similar code exists, box that area, and reuse the method you've just extracted.

Look for groups of methods that "hang together" that can be broken out into their own classes. Write some tests for how those classes should work, build the classes using the existing code as a template if need be, then substitute the new classes into the existing code, removing the methods that they replace. Again, using your tests to make sure that you're not breaking anything.

Make enough improvement to the existing code so that you feel you can implement your new feature/fix in a clean way. Then write the tests for the new feature/fix and implement to pass the tests. Don't feel that you have to fix everything the first time. Aim for gradual improvement, but always leave the code better than you found it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top