Scaling Literate Programming?

https://stackoverflow.com/questions/299076

08-07-2019
|

Question

Greetings. I have been looking at Literate Programming a bit now, and I do like the idea behind it: you basically write a little paper about your code and write down as much of the design decisions, the code probably surrounding the module, the inner workins of the module, assumptions and conclusions resulting from the design decisions, potential extension, all this can be written down in a nice way using tex. Granted, the first point: it is documentation. It must be kept up-to-date, but that should not be that bad, because your change should have a justification and you can write that down.

However, how does Literate Programming Scale to a larger degree? Overall, Literate Programming is still just text. Very human readable text, of course, but still text, and thus, it is hard to follow large systems. For example, I reworked large parts of my compiler to use >> and some magic to chain compile steps together, because some "x.register_follower(y); y.register_follower(z); y.register_follower(a);..." got really unwieldy, and changing that to x >> y >> z >> a made it a bit better, even though this is at its breaking point, too.

So, how does Literate Programming scale to larger systems? Does anyone try to do that?

My thought would be to use LP to specify components that communicate with each other using event streams and chain all of these together using a subset of graphviz. This would be a fairly natural extension to LP, as you can extract a documentation -- a dataflow diagram -- from the net and also generate code from it really well. What do you think of it?

-- Tetha.

Solution

Excellent question. The motivation for literate programming will never go away, but I think it should be treated as fluid. It means "give the reader a break, and educate them to what you're trying to do". I don't think it means "make your code really wordy".

That said, the reader will have to put some effort into it, depending on what they already know. Presumably the code is worth understanding, and nothing comes for free.

I also think it means more than just making readable code. Most likely the reason someone is reading the code is because they need to make a change. You should anticipate the possible changes that might be needed, and tell them how to do it if necessary.

OTHER TIPS

The book "Physically Based Rendering" (pbrt.org) is the best example of large-scale literate programming that I'm aware of. The book implements a complete rendering system, and both the book text and the raytracer code are generated from the same "source".

In practice, I've found that just using a system like Doxygen and really digging in and making use of all of its features is better than full-blown "literate" programming, except for things like this, i.e. textbooks, educational materials.

I did some literate programming with WEB some 15 years ago. More recently I tried extracting code from a wiki and generating documentation from a Squeak Smalltalk environment.

The bottom-up part can be handled relatively well by generating documents from TDD/BDD frameworks, but LP focuses on explaining the code to the reader.

There are a few issues:

the story to tell is different for different stakeholders/readers;
the project structure in most environments is not the structure needed for story-telling;
support for successive refinement/disclosure is missing;
in addition to text support for pictures is needed;
from the comments in the source control system one can derive how the system was build. The story should be how the system could have been build (with perfect hindsight).

For LP to work for larger systems, you need better IDE support than a wiki or an object browser.

"Overall, Literate Programming is still just text"

False.

Diagrams are fine.

My thought would be to use LP to specify components that communicate with each other using event streams

That's just architecture, and that's fine.

you can extract a documentation -- a dataflow diagram -- from the net and also generate code from it really well. What do you think of it?

Data flow diagrams aren't really all that helpful for generating detailed code. They're a handy summary, not a precise source of information.

A good writing tool (like LaTex) can encode the diagram in the document. You could probably figure a way to the diagram from other parts of the documentation.

Bottom Line

In the long run, you're better off generating the diagram as a summary of the text.

Why?

Diagrams intentionally elide details. A diagram is a summary or an overview. But as a source for code, diagrams are terrible. In order to provide all the details, the diagrams become very cluttered.

But a diagrammatic summary of some other LP markup will work out fine.

pbrt is a physically based ray tracer written in the literate style for the education of computer science graduates (and me), it is a moderately large scale system. As a non-specialist programmer this level of documentation is pretty essential for understanding what the program does and why it does it.

I also have access to a research-renderer, in Java, which is well-written but relatively undocumented but for a few SIGGRAPH papers. This is also relatively understandable, but I have access to the authors too.

I've also used ImageJ quite a lot, and looked under the hood at underlying Java - it's pretty difficult to follow without an idea of the underlying philosophy.

In sum, my view is that literate programming is great if someone can find the time to do it well and this is likely to be in educational settings. It's difficult to see it being done in commercial code production. I'm skeptical of the idea that code can be entirely self-documenting.

The idea behind literate programming is emphasis on the documentation, with code sprinkled through the documentation, rather than comments sprinkled through code.

This is an essentially different philosophy, and differences like longer variable names, namespaces, and classes don't affect the philosophy. Literate programming advocates meaningful variable names.

It scales up to larger systems, because the basic ratio of documentation to code scales linearly with the size of code.

Literate Programming was developed in an era where long variable and function names were simply not possible. Because of this, code really wasn't that readable.

Obviously, a lot has happened since then.

In today's world, the code itself is the documentation, hence the term "self documenting code." The realization is that no set of comments or external documentation can ever stay in sync with the underlying code. So, the goal of a lot of today's programmers is to write the code in such a way that it is readable to others.

Try NanoLP - LP extensible tool, supports many document formats (Markdown, OpenOffice, Creole, TeX, Asciidoc and other), importing of another LP programs, templating and more. User can add own commands/macros (in Python), for example to do special importing, for example, from VCS... http://code.google.com/p/nano-lp

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow