Question

I frequently write throwaway code (in a research environment) - for example to explore an algorithm or a model for a scientific property or process. Many of these "experiments" are one-off but sometimes I find that I need to use a few later. For example I have just unearthed code for string matching I wrote 7 years ago (stopped because of other priorities) but which is now valuable for a coworker's project. Having looked at it (did I really write such impenetrable code?) I realise there are some things I could have done then to help me when I restarted the "project" ("experiment" is still a better word). The earlier experiment "worked" but I know that at the time I would not have had time to refactor as my priorities lay elsewhere.

What approaches are cost-effective in enabling such work to be dug up and re-used?

EDIT: I have answered my own question (below) because there are issues beyond the actual source itself.

Was it helpful?

Solution

I disagree with all of the answers saying "write comments". That's being offered as a catch-all for the code itself not being understandable.

Get yourself a copy of Code Complete (Steve McConnell, 2nd edition). If you learn the techniques of writing maintainable code in the first place, it won't take you more time, and you will be able to return to your work later with less trouble.

Which would you prefer:

  • Cryptic code with comments?
  • Mostly OK code without?

I strongly prefer the latter, as the OK code is easier to understand in the situations where the cryptic code was uncommented, and comments are another place that the original developer can make mistakes. The code may be buggy, but it's never wrong.

Once you're comfortable with Code Complete, I'd recommend The Pragmatic Programmer, as it gives slightly higher-level software-development advice.

OTHER TIPS

[Answering own question] There are several other aspects to the problem which haven't been raised and which I would have found useful when revisiting it. Some of these may be "self-evident" but remember this code was pre-SVN and IDEs.

  • Discoverability. It has been difficult actually to find the code. I believe it's in my SourceForge project but there are so many versions and branches over 7 years that I can't find it. So I would have to have a system that searched code and until IDEs appeared I don't think there was any.
  • What does it do?. The current checkout contains about 13 classes (all in one package as it wasn't easy to refactor at the time). Some are clear (DynamicAligner) but others are opaque (MainBox, named because it extended a Swing Box). There are four main() programs and there are actually about 3 subprojects in the distrib. So it is critical to have an external manifest as to what the components actually were.
  • instructions on how to run it. When running the program, main() will offer a brief commandline usage (e.g. DynamicAligner file1 file2) but it doesn't say what the contents of files actually look like. I knew this at the time, of course but not now. So there should be associated example files in sibling directories. These are more valuable than trying to document file formats.
  • does it still work?. It should be possible to run each each example without thinking. The first question will be whether the associated libraries, runtimes, etc. are still relevant and available. One ex-coworker wrote a system which only runs with a particular version of Python. The only answer is to rewrite. So certainly we should avoid any lock-in where possible, and I have trained myself (though not necessarily coworkers) to do this.

So how can I and coworkers avoid problems in the future? I think the first step is that there should be a discipline of creating a "project" (however small) when you create code and that these projects should be under version control. This may sound obvious to some of you, but in some environments (academia, domestic) there is a significant overhead to setting up a project management system. I suspect that the majority of academic code is not under any version control.

Then there is the question as to how the projects should be organized. They can't be on Sourceforge by default as the code is (a) trivial and (b) not open by default. We need a server where there can be both communal projects and private ones. I would calculate that the effort to set this up and run it is about 0.1 FTE - that's 20 days a year from all parties (installation, training, maintenance) - if there are easier options I'd like to know as this is a large expense in some cases - do I spend my time setting up a server or do I write papers?

The project should try to encourage good discipline. This is really what I was hoping to get from this question. It could include:

  1. A template of required components (manifest, README, log of commits, examples, required libraries, etc. Not all projects can run under maven - e.g. FORTRAN).
  2. A means of searching a large number (hundreds at least) of small projects for mnemonic strings (I liked the idea of dumping the code in Googledocs, and this may be a fruitful avenue - but it's extra maintenance effort).
  3. Clear naming conventions. These are more valuable than comments. I now regularly have names of the type iterateOverAllXAndDoY. I try to use createX() rather than getX() when the routine actually creates information. I have a bad habit of calling routines process() rather than convertAllBToY().

I am aware of but haven't used GIT and Mercurial and GoogleCode. I do not know how much effort these are to set up and how many of my concerns they answer. I would be delighted if there was an IDE plugin which helped create better code (e.g. "poor choice of method name").

And whatever the approaches they have got to come naturally to people who do not naturraly have good code discipline and to be worth the effort.

As the excellent answers in your other post indicate, and from my own experience, there is a difficult-to-cross gap between the software used for research and software that has been engineered. In my opinion, Code Complete might help a little, but not much. As an economic question, is it going to be worthwhile to refactor everything for reuse compared to the occasional reward for finding a later use for something? Your balance point may vary.

Here's a practical tip for storing snippets. Instead of full-blown comments, throw in some keywords:

  • "graph isomorphism wrapper"
  • "polymer simulated annealing"
  • "string match feynmann"
  • "equilibrium"

and then put the code somewhere Google-searchable, like a GMail account.

Edit: I might add that free Google Sites are really searchable wikis that are a good place to put code, either in the form of attachments or pasted in.

Also, I should say that I am a fan of Code Complete and have given copies to grad students writing software for scientific research for several years. It's a good start, but no silver bullet. I'm writing a paper right now on using open source frameworks to solve scientific data management problems and one of the conclusions is that some software engineering expertise is essential for long-running systems. Many scientific projects should probably budget for this from the beginning.

Comments - describe what you were thinking and why you chose to implement something a certain way including what alternatives you considered. There are probably all sorts of fancy solutions but just commenting your code properly at the time you are writing it seems to work the best.

I would echo what the others have said as far as commenting the "why's" of why the code was written and it's intended usage, but I would also add this:

Code as if you were planning on putting this into production even when you're just messing around. Code for:

  • Clarity and readability
  • Follow the coding conventions of the time. (naming conventions, etc). Even though such conventions change over time, if you stick to the standards you're more likely to be able to understand it later.
  • Security (if applicable)
  • performance (if applicable)

Particularly, I would stress the first point, but the others are important as well. I find that if I use "test code" later on, I tend to just use it if it works, rather than refactoring it.

I've probably missed the point of this whole discussion, I frequently do, but here goes, an invitation for brickbats and downvoting ...

If it's throwaway code, throw it away !

If you don't want to throw it away then follow the good advice above. For me, and I write a fair amount of throwaway code, the question of whether it gets thrown away or put into a reusable state and kept against a rainy day boils down to the economics.

Can I foresee circumstances in which this code will be useful again ? Once in a blue moon, twice a year, every month ?

Will I be able to rewrite this code in less time than it takes to make it reusable ? If the answer to this question is No, then how many times will I have to reuse it to make it worth while enhancing it now ? (Back to the previous question.)

If I do make this code reusable, will I be able to find it again when I next want it ? (Anyone ever had the experience of knowing, with absolute certainty, that somewhere in your code repository there is just the fragment you want, but not having a clue what it was called, nor where to look nor what to grep for ?)

Finally, the 3 step approach to making quickly-written code reusable. Stop after whichever of these steps you like:

1) Document the code as a black-box. Inputs, outputs, operation(s). File this document carefully.

2) Write instructions about how to build/interpret/install the code, in case you ever have to port it. File these instructions carefully.

3) Only if worth the effort -- improve the source code quality to make the code maintainable in future. Make sure the sources are in the source control system and findable.

Regards

Mark

I think the most import thing (if you do no refactoring it isn't going to happen) is to comment and document your thought process at the time. It will help make the code less impenetrable and help you find the good bits when needed.

No, No, No, No, No!

Do not write throwaway code even in a research environment. Please!

Currently I'm messing with such a "throwaway code", namely BLAST project. The thing is that it started as a playground but then happened to become somewhat successful, Now it's a neat tool with many concepts implemented, but the code is virtually unmaintainable. But that's not the main point.

The main point is, you do research for engineers to later benefit from your findings. Having done a good scientific work on general concept and writing a tool that proves this successful, you can easily forget that you're doing it not for publication and PhD only. You do it for the benefit of the mankind. Your code may contain a bunch of "special cases", that were hard to debug, a set of quirks and hacks that do not fit into any conference article. It's especially important to document and comment such things throughout your code.

If a developer decided to implement your concepts in a commercial product, he could have studied the quirks and hacks from your code and the implementation would ten have less bugs than it might have had. Everyone says "Wow, his research on A really is useful!" But if you write "throwaway", they say "his concept looks nice on paper, but X tried to implement it and drowned in a bunch of bugs".

(EDIT: taken from comments below) To help future developers of your codebase, you don't need much. First, comment what each function does. Second, make sure that every non-obvious fix of a tricky bug is placed in a separate commit in revision-control system (with an appropriate comment, of course). That's quite enough. And if you even make things modular (even if they're not ready for outright reuse--that's three times more costly, according to Brooks) you will be adored by engineers who implement your research.

I think that the world would be a better place if researchers threw away their hubris and stopped haughty thinking that they're not these dirty coders who do menial job of writing a good code. Writing a good code is not just a job for these stupid programmers. It is a really valuable thing everyone should strive. Without this, your experimental ground, your code, your brainchild will just die.

Some strategies:

  1. Good comments. Hard to reuse what you can't find or understand later.
  2. Save every query to a folder that is backed up or is under source control.
  3. Have a common library of useful functions that you "promote" something to once it has been reused.

You could also borrow the idea of unit tests from the TDD (test-driven development) folks. You need to make sure that the throwaway code actually works OK anyway, so why not express the check linke a small unit test? This would have two advantages:

  1. Reading the test code communicates the intent of the throwaway quite clearly: after all it expresses its expectations in the same language: code.

  2. It would also help with the 4th problem of your self-reply: "does it still work?". Well, it's easy: just run the unit tests and they tell you what and where (and with a bit of luck) why (it) doesn't work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top