If you manage a BSD-licensed open source project, how do you guard against someone illegally contributing GPL-licensed code?

softwareengineering.stackexchange https://softwareengineering.stackexchange.com/questions/367401

Question

An open source project licensed under BSD, MIT or another permissive license accepts code contributions from the community.

How can I prevent someone from taking GPL-licensed code that they don't own and submitting it to my BSD-licensed project? I do not know that the contribution was stolen from a GPL-licensed project and accept it.

I do not wish to accept such contributions, so as not to make the entire project GPL. But I have no way of knowing whether the contributor actually holds the copyright to code they are contributing. So if someone illegally contributes GPL-licensed code to my project I do not know of any way to stop them (short of not accepting any contributions at all).

Surely, there are plenty of BSD and MIT licensed project out there, so there must be a solution.

Thanks!

Was it helpful?

Solution

enter image description here

One does not simply "guard against" illegal contributions.

You never accept blindly a contribution, and should have a process to vet contributions (including yours) for several kinds of troubles:

  • unit tests (automated)
  • backdoors and security flaws (static analysis might help, other tools exist)
  • code smells (automated)
  • poor code logic (peer review, "enough eyes to make bugs shallow", etc. - the story of OpenSSL shows that this might not be enough)
  • I'm sure I missed several others - contributions welcome

"All" that you need to do would be to add a check for plagiarism. This can be done, to a point, with automatic tools by just googling the relevant lines of code.

I've tried just now by lifting some samples of code from projects, and it works. I have simply extracted strings, formats, comments and function names and prototypes from the code, then googled them all, and looked at where a single site appeared in multiple matches. In 17 tests out of 19 the source site was the first of five candidates; in all cases, the site did appear among the first five. By contrast, pieces of my own code only triggered false positives in three (four) cases out of twenty, with very low quality targets, so by quickly perusing a half dozen of sites I was able to dismiss the alert. With the GPL code, moreover, the snippet from Google Search was visibly the same as the code I had test-filked.

At this point I'm confident that you could do that by hand. Take a look at the code, look at the comments (do they make sense? If not, that's another kind of red light. If yes, either they've been all reworded(!) or you'll find them), try a couple of text strings, plug them in Google and/or other search engines.

And you only need to this for sizeable contributions.

From a legal point of view - I am quickly wading out of my depth - I imagine that you need your contributors to accept some form of waiver or agreement in which they state that they're going to contribute honestly.

When they don't, and you missed that (e.g. they morphed/obfuscated some GPL code so that it isn't found by googling), IMHO chances are that it'll never be found out unless it was done for entrapment purposes, and they themselves blow the whistle. At that point your project will be in violation of the GPL and you can:

  • remove the GPL code from the project
  • declare the rest of the project to fall under the GPL

For the whole scenario to be any kind of realistic trouble, the "contributor" should need to:

  • locate a meaningful GPL code section of useful proportions,
  • remove all licensing information,
  • thoroughly rewrite it - comments, function names, non-trivial variable names, text strings - so that a search won't find the code, and yet leave it recognizably the same, and at the same time leave it working
  • let the code "stew" in the codebase and other contributors rely on its functionality, to the point where removing it would be an issue

The whole scenario, especially the last point, seems to me really far-fetched. Once the plagiarism check is in place, I would cease worrying.

Licensed under: CC-BY-SA with attribution
scroll top