How to retrieve the third-party library licenses
https://softwareengineering.stackexchange.com/questions/280425
-
08-10-2020 - |
Question
I am looking for a way to automatically collect all the third-party library licences that my project is using. Currently I am collecting by hand the licences on github.
So far , I don't have a clear idea how to get a 3rd party library licence automatically. What is the most reliable way to get the 3rd party licence ?
Small ideas :
Most Github projects contain a licence text. ex : https://github.com/square/dagger . But can you map a dependency 'com.squareup.dagger:dagger:1.2.2' with its github url ?
most JVM artifacts are found on mvnrepository . I don't know if mvnrepository.com list the licence.
the .jar files may contain licence text . How to extract it ?
Related : What is the best practice for arranging third-party library licenses "paperwork"?
Solution
One possible way to automate part of this is the following algorithm:
Add the project GAV to queue
For each GAV in queue
Add all dependencies from GAV to queue // optional after first run?
Download jar
Extract/unzip jar and search root directory of jar for file containing "license" // see Java zip classes
Parse root pom.xml for license information
if neither work
output that license information could NOT be found
else
save license information for GAV
// end for loop
You could create a maven plugin that does this and outputs the file to the root directory of your project (instead of a build directory) so that you notice when the file changes. Otherwise, a perl/python script might be easier (but also more of a hack.. :) ).
Given that it's easy to use transitive dependencies in your code without knowing it, you should also look at using the the Ban Transitive Dependencies enforcer rule.
If you do not do that, then I would definitely make sure to scan all transitive dependencies for their licenses (always use the 3rd line of the algorithm).