Question

Is there any paper describing any algorithm/technique to infer subroutines from a compiled program? In other words: is there an algorithm to find blocks of code that appear more than once in the program? These blocks could have the instructions reordered (without program behavior change, of course) so that it's more likely to find a match.

This process can be seen as the opposite of subroutine inlining that is done by compilers to avoid calls, but increasing the binary size.

It seems to me that this is a very hard theoretical problem.

Was it helpful?

Solution

Well, it's an interesting problem. People did actually work on this. A quick search returns these two:

But there are probably many more. You could use Google Scholar to find more recent papers that reference these old ones.

OTHER TIPS

What you are looking for is called a "clone detector". You can do this on source code, or object code. The key idea is to decide what points of variability you want to accept.

You can read about our CloneDR clone detector, which finds duplicated code by comparing the syntax trees of the source files, finding exact and near-miss matches. It does across many files rather than just within one source file. This is kind of like "common subexpression" detection, but it works on declarations as well as executable code. When the match isn't exact, it can determine parameters for the "subroutine" (abstraction).

See my paper on Clone Detection Using Abstract Syntax trees for an algorithmic description.

CloneDR does this for many languages, using language-precise front end parsers.

The site describes how CloneDR works, and compares CloneDR with a number of other clone detection tools.

CloneDR doesn't handle "instruction reordering". Less scalable methods that find duplicates by comparing PDGs can do this. These come pretty close to comparing data flow graphs, which might be good for finding machine instruction code matches.

Maybe this is dumb .. but consider "diff". It basically does a restricted version of this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top