Question

Let say I have a project that I have released under GPL, with the sources available to anyone. Later I find a very similar product, but as closed source, distributed binary-only by someone else.

Is there a good way to find out they are using my source code in their product?

If the solution is to somehow reverse-engineer the binary, is it possible to somehow automate it?

EDIT: Clarification. The bug hunt is one option, but not definitive, especially if the project is a library and the binary has added its own GUI, for example. The situation I'm interested is when its not blatantly obvious that the code is lifted.

Was it helpful?

Solution

Look for Software Birthmarks. This method tries to establish links between software based on binary code or dynamic behavior. Christian Collberg is an expert on Software Watermarks, from which birthmarks were derived. This is all still in research land.

OTHER TIPS

Bugs.

If the closed source release shares most of it's bugs with your project, it's probably 'lifted'.

You could also try decompiling your own binary with a decompiled version of the closed source binary... though this would probably not be reliable.

Obviously, if the suspected binary is not stripped, you can just look for any symbols that share the same name as your code's.

There's a large body of work on decompiling and reverse-engineering binary codes. The world expert is probably Cristina Cifuentes. She's done a lot with decompilation. It would also be interesting to write to Alex Aiken and ask if his tool for Measure o f Software Similarity could be adapted to binary codes.

An obvious method is to search for strings. run the unix strings tool and see if the binary contains any of the literal strings from your code. mainly stuff like error messages and text in messageboxes.

You could try to disassemble both programs and compare the assembly, but if they used a different compiler then thier program could have minor differences. There are a few free disassemblers or a debugger could also step through in assembly.

Other than that there really isn't an easy way to find out that kind of thing.

The most surefire way I can think of is similar to the word 'Esquivalience' in the oxford dictionary.
Simply add some binary array with a unique content somewhere in the code and don't forget to make some simple use of it so the linker won't optimize it away. You should probably obfuscate it somewhat so that it will not be obvious to the casual reader that that it's redundant.
Then open the compiled binary with a hex editior and look for it.

Why don't you look at the symbol table using nm?

$ nm a.out
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top