Question

I would like to render a CFG out to high-level code. Normally this is very easy; walk the tree, render each basic block in turn, glue it all together with gotos.

Unfortunately, gotos are out of fashion these days, and most modern languages don't support them. So I need some way to glue my basic blocks together using only those control flow statements that exist in the language: for, while, do...while, if, break and continue. (I'm not willing to consider building a state machine using variables.)

It would appear that while there are algorithms to do this, they will not work in every case. That is, it's possible to construct a CFG that cannot be flattened to structured code using only the above limited set of control flow structures.

This seems intuitively obvious to me, but I can't prove it (and the documentation for the algorithms I've found don't go into more detail). And I haven't been able to find an example of a CFG which can't be flattened like this.

I would like to know, definitively, if this is possible or not.

Option (a): does anyone have a example of a CFG which cannot be flattened as described above? (Which will tell me that it's not possible.)

Option (b): does anyone have a proof that CFGs can be flattened as described above? (Which will tell me that it is possible.) An algorithm to do it would be highly desirable, too, as I would then have to make it work...

Was it helpful?

Solution 2

I think I have a result.

The answer seems to be: it is not possible. This is from Communications of the ACM, volume 9, pages 366 to 371 in a paper from 1966 called "Flow Diagrams, Turing Machines and Languages with only Two Formation Rules" by Giuseppe Jacopini. CiteSeer link. (Which, amusingly, I found referenced from Knuth's seminal (and, from my point of view, incredibly annoying) Go To Statement Considered Harmful.)

Disappointingly, they don't have a proof, saying they were unable to find one.

The good news is that the paper does describe a strategy for converting an arbitrary CFG into a CFG using only limited control-flow mechanisms in an efficient fashion, using as little state as possible. The paper is pretty hard going but it looks promising.

OTHER TIPS

although this question was asked a long time ago this actually seems to be possible. Mozilla had a similar problem when compiling LLVM to JS (or now WebAssembly). JS and WebAssembly only allow structured control flow, while LLVM allows arbitrary control flow.

They'v written a paper about this which is also used for WebAssembly:

This idea is modeled on the Relooper algorithm from 2011. There is a proof there that any control flow can be represented in a structured way, using just the available control flow constructs in JavaScript, and using a helper variable like label mentioned in the Tilt semantics, without any code duplication (other approaches split nodes, and have bad worst-case code size situations). The relooper has also been implemented in Emscripten, and over the last 4 years we have gotten a lot of practical experience with it, showing that it gives good results in practice, typically with little usage of the helper variable.

In general, you can't just flatten a CFG by walking the tree. This will work for LL(k) grammars, if you have k look-ahead tokens. However, for more complex grammars, like LR(k) grammars, more sophisticated techniques are required. See, for example, http://en.wikipedia.org/wiki/LR_parser.

In general, there is no known algorithm that parses ANY CFG, although most CFGs that are useful can be written as an LR(k) grammar. More research improves on this, and large classes of CFGs can be parsed. I don't think that the problem is undecidable (though I'm not sure), so it's certainly possible that this can always be done - but I think this is a research problem, and not something that will be answered yes/no for you here.

I should also add that all worthwhile languages today are Turing-complete, which means anything you can do with GOTOs can be done with if/while/for/... type constructions. New languages aren't the limitation, it's the theoretical building blocks that need help.

In practice though, you won't be able to parse any CFG you want. But that doesn't mean we won't know how in the future...

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top