Description

we need to perform a check that all of our bundled works are properly accounted for in our LICENSE/NOTICE files.

At a minimum, it looks like HADOOP-10075 introduced some changes that have not been accounted for.

e.g. the jsTree plugin found at hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jt/jquery.jstree.js does not show up in LICENSE.txt to (a) indicate that we're redistributing it under the MIT option and (b) give proper citation of the original copyright holder per ASF policy.

Activity

Important to note that the jsTree example is not meant to be exhaustive; I did not look to see what else wasn't updated I just randomly searched for a copyright string. I also have not yet looked to see if the binary bundlings properly account for the update (see HBASE-12894 for where folks over in HBase are checking for the same).

Sean Busbey
added a comment - 01/Nov/16 15:18 Important to note that the jsTree example is not meant to be exhaustive; I did not look to see what else wasn't updated I just randomly searched for a copyright string. I also have not yet looked to see if the binary bundlings properly account for the update (see HBASE-12894 for where folks over in HBase are checking for the same).

Akira Ajisaka
added a comment - 04/Nov/16 19:37 Updated the list of bundled jars which was originally created for HADOOP-12893 .
https://gist.github.com/aajisaka/6f61ae083770739d57720745bcb12f0d/revisions

HADOOP-10075 didn't add any new js files; they were all there this entire time, but many of them were only gzip versions (i.e. jquery.jstree.js.gz --> jquery.jstree.js). I guess we've been missing this for a long time then.

Robert Kanter
added a comment - 21/Nov/16 21:36 HADOOP-10075 didn't add any new js files; they were all there this entire time, but many of them were only gzip versions (i.e. jquery.jstree.js.gz --> jquery.jstree.js). I guess we've been missing this for a long time then.

Manually fix the jstree stuff, and others turned out missing. Looks like this has to be manual, without some sophisticated tooling. As Robert said, HADOOP-10075 only extracted that jquery.jstree.js.gz, which was committed by.... YARN-1.

Add a way to verify this in pre-commit, so this work in the future will be upfront.

Xiao Chen
added a comment - 05/Dec/16 23:47 Thanks Sean Busbey for reporting this. I'd like to take a shot at this one to move alpha2 forward.
It seems more things are added since Akira's last update (188 lines now in my run today https://gist.github.com/xiao-chen/336b64b1b17e8813fd5b980013ac7eb4 )
I plan to do the following things here:
Fix the diff in L&N since HADOOP-12893 , in a similar way.
Manually fix the jstree stuff, and others turned out missing. Looks like this has to be manual, without some sophisticated tooling. As Robert said, HADOOP-10075 only extracted that jquery.jstree.js.gz, which was committed by.... YARN-1 .
Add a way to verify this in pre-commit, so this work in the future will be upfront.
1 and 2 should unblock the release, 3 would make our lives easier.

hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jt/jquery.jstree.js is actually noted in LICENSE! It is named with .gz extension. Since HADOOP-10075 removed the .gz and left the extracted .js, I think updating the name and move it to MIT License section in our LICENSE should suffice. This is legal since the header of that file says it's MIT, and apache need not that to be in the NOTICE.

Bad news is, those js, css or anything outside of a maven dependency isn't checked by the tool.

Xiao Chen
added a comment - 06/Dec/16 01:31 Regarding #1, thanks Akira Ajisaka for the commands from HADOOP-12893 , I built a new output at https://gist.github.com/xiao-chen/6131ec9718ec4b1af286f048bd714c6f . Also looked at Apache Rat which seems too naive, and Apache Whisker which isn't documented clear enough (to me).
Quick look at #2:
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jt/jquery.jstree.js is actually noted in LICENSE! It is named with .gz extension. Since HADOOP-10075 removed the .gz and left the extracted .js, I think updating the name and move it to MIT License section in our LICENSE should suffice. This is legal since the header of that file says it's MIT, and apache need not that to be in the NOTICE.
Bad news is, those js , css or anything outside of a maven dependency isn't checked by the tool.

Xiao Chen
added a comment - 07/Dec/16 01:46 Update for #1: got a parsed result at https://docs.google.com/spreadsheets/d/1jpeVlwydkgM01FNW4GPdAgzch5kuLC8ka09yiewKy7w/edit#gid=1885055871 .
We had 211 rows when doing HADOOP-12893 , now it is 356. I don't see any disallowed license at the first glance, but will go through them in details. If anyone knows any lawyer super weapon to automate/shortcut this, please shout.

In the faith of unblocking 3.0.0-alpha2 release, how do people feel about doing #1 and #2 from my above comment in this jira, and defer the automation #3 to another jira? #1 is almost done, and #2 shouldn't be too hard. So should be able to post a patch this week.

I have some local nasty scripts to sort of automate this, with some things that need manual inspection. However even myself feel those scripts are not to the standards... don't want them to block our mighty hadoop release.

Xiao Chen
added a comment - 13/Dec/16 20:27 In the faith of unblocking 3.0.0-alpha2 release, how do people feel about doing #1 and #2 from my above comment in this jira, and defer the automation #3 to another jira? #1 is almost done, and #2 shouldn't be too hard. So should be able to post a patch this week.
I have some local nasty scripts to sort of automate this, with some things that need manual inspection. However even myself feel those scripts are not to the standards... don't want them to block our mighty hadoop release.

I have finished up a first draft of #1, shown in the 'Dependencies' tab of this jira's linked spreadsheet. Will work on closing the final gaps, and start on #2.

Among those dependencies:

jdiff is LGPL but according to HADOOP-12893, it's not bundled so we're good.

ldapsdk is new, I did a quick search in pom but didn't find any. Will look more.

JSON needs some help: Akira AjisakaAndrew Wang, apologize for my fading memory, but do you recall what was done for that in HADOOP-12893? (Searched the jira but no mention from the comments, and in the spreadsheet it's marked Done? == N... I seem to remember all things are done when we posted patches/resolved that jira.) Anyways, bundled? is also N, so I'm guessing that's the reason this is omitted at that time.

Ping me if anyone wants edit perm to the spreadsheet. Note that the Dependencies and parsed tabs are totally script-generated, and are supposed to be replaced in later runs. In case anyone is curious, here's how to (nastily) generate:

Xiao Chen
added a comment - 14/Dec/16 07:56 Thanks Sean for the comment.
I have finished up a first draft of #1, shown in the 'Dependencies' tab of this jira's linked spreadsheet. Will work on closing the final gaps, and start on #2.
Among those dependencies:
jdiff is LGPL but according to HADOOP-12893 , it's not bundled so we're good.
ldapsdk is new, I did a quick search in pom but didn't find any. Will look more.
JSON needs some help: Akira Ajisaka Andrew Wang , apologize for my fading memory, but do you recall what was done for that in HADOOP-12893 ? (Searched the jira but no mention from the comments, and in the spreadsheet it's marked Done? == N... I seem to remember all things are done when we posted patches/resolved that jira.) Anyways, bundled? is also N, so I'm guessing that's the reason this is omitted at that time.
Ping me if anyone wants edit perm to the spreadsheet. Note that the Dependencies and parsed tabs are totally script-generated, and are supposed to be replaced in later runs. In case anyone is curious, here's how to (nastily) generate:
xiao-MBP:license xiao$ cat step1.sh
#!/bin/sh -x
# First save spreadsheet to local:
# 'Licenses' tab to licenses.tsv
# 'Overrides' tab to overrides.tsv
# 'parse.py script' tab to parse.py
# 'standardize.py' tab to standardize.py
# 'generate.py script' tab to generate.py
mvn license:aggregate-add-third-party
OUTPUT_DIR=~/Downloads/license/
cp target/generated-sources/license/THIRD-PARTY.txt $OUTPUT_DIR
xiao-MBP:license xiao$ cat step2.sh
#!/bin/sh -x
python parse.py > parsed.tsv
xiao-MBP:license xiao$ cat step3.sh
#!/bin/sh -x
python standardize.py
# will generate a standardized.tsv, which is the 'Dependencies' tab in the spreadsheet.

I think JSON is JSON.org, which should be covered based on notes in HADOOP-13794.

I'm sad if we've somehow added over 100 dependencies in a few months since the last L&N update, but I think I massaged the list last time. We can remove our own Apache Hadoop deps from that list for instance, and there are entries for Apache DS and Maven that can be collapsed.

Andrew Wang
added a comment - 14/Dec/16 18:47 I think JSON is JSON.org, which should be covered based on notes in HADOOP-13794 .
I'm sad if we've somehow added over 100 dependencies in a few months since the last L&N update, but I think I massaged the list last time. We can remove our own Apache Hadoop deps from that list for instance, and there are entries for Apache DS and Maven that can be collapsed.

Cool, will leave JSON out of the L&N since it's test-only, and have HADOOP-13794 deal with it.

Surprised me on the dependency growth too, but as you said they won't necessarily all be listed in the LICENSE/NOTICE. Also this includes transitive deps, for example as I find out the ldapsdk in my above comment is from this, so should be good too:

Xiao Chen
added a comment - 14/Dec/16 19:24 Cool, will leave JSON out of the L&N since it's test-only, and have HADOOP-13794 deal with it.
Surprised me on the dependency growth too, but as you said they won't necessarily all be listed in the LICENSE/NOTICE. Also this includes transitive deps, for example as I find out the ldapsdk in my above comment is from this, so should be good too:
$ mvn license:aggregate-add-third-party -X -e
...
[INFO] Forking Apache Hadoop Auth 3.0.0-alpha2-SNAPSHOT
...
[DEBUG] org.apache.hadoop:hadoop-auth:jar:3.0.0-alpha2-SNAPSHOT
...
[DEBUG] org.apache.directory.server:apacheds-server-integ:jar:2.0.0-M21:test
...
[DEBUG] ldapsdk:ldapsdk:jar:4.1:test

Only thing left for this jira is look for (compressed) files like jstree, and include those as well. I think this is just a matter of time, and should be less than 1/2 day of work. (step5.sh will do this, currently not working).

Xiao Chen
added a comment - 19/Dec/16 07:04 Attaching a patch 1 that takes care of everything HADOOP-12893 has done for alpha-1.
I also have the automated scripts at https://github.com/xiao-chen/hadoop/tree/13780/dev-support/license , step1-step4 should give 2 files notices and licenses (instructions in step1). Merging to current L&N files are manual.
Only thing left for this jira is look for (compressed) files like jstree, and include those as well. I think this is just a matter of time, and should be less than 1/2 day of work. (step5.sh will do this, currently not working).
Any review/comments appreciated!

I was thinking they're not elegant enough for the hadoop code base, and would need some extra reviews, so not included here to prevent distraction from L&N themselves. Also there're still some manual steps (e.g. merging the generated L&N into current, checking what NOTICE should a dependency need etc). But if all-inclusive is desired here, I can try.

Xiao Chen
added a comment - 20/Dec/16 23:35 I was thinking they're not elegant enough for the hadoop code base, and would need some extra reviews, so not included here to prevent distraction from L&N themselves. Also there're still some manual steps (e.g. merging the generated L&N into current, checking what NOTICE should a dependency need etc). But if all-inclusive is desired here, I can try.

Xiao Chen
added a comment - 03/Jan/17 06:28 Add a rebased patch 3 - and also ran the scripts to reflect the current state.
Also adding the best-effort scripts on top of patch 3 - at least they're self-documenting now. Notably there are still some manual steps and improvements to be done. Lawyer's road isn't easy.

Thanks for your hard work on this Xiao! +1 LGTM to unblock the release, though we need a follow-on to improve the scripts. I'm sure these are already on your todo list, but a few thoughts along those lines:

We should try to remove the dependency on the externally managed GDoc. Checking in an exported sqlite DB or some csvs would be an improvement.

generate.py is still in the spreadsheet

the manual merge step is unfortunate, ideally everything is fully-generated by a single script and input data.

Andrew Wang
added a comment - 03/Jan/17 17:26 Thanks for your hard work on this Xiao! +1 LGTM to unblock the release, though we need a follow-on to improve the scripts. I'm sure these are already on your todo list, but a few thoughts along those lines:
We should try to remove the dependency on the externally managed GDoc. Checking in an exported sqlite DB or some csvs would be an improvement.
generate.py is still in the spreadsheet
the manual merge step is unfortunate, ideally everything is fully-generated by a single script and input data.
Also, did we ever file a JIRA to do a precommit check?

Xiao Chen
added a comment - 03/Jan/17 18:09 Thanks Andrew, filed 2 jiras linked here, for pre-commit and for automation.
Agree on getting rid of the gdoc - we can just use tsvs but for this jira, the gdoc is the one place to include all of them. Interesting idea about sqlite db, will play with it.
The fully-automatic is possible, but more work needed than current state (what I do manually now):
Those raw files (js/css/etc.) needs a doc to get managed, and merged
Need to add a new entry for the overrides so we can intentionally ignore some (jdiff, json, ldapsdk as we found out so far)
Once those are done, will need a wiki/instruction page to use it.

The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.

There's no entry for the bundled Orca Logo from the HBase Server jar. It's mentioned in NOTICE, but LICENSE should have a complete reference for the CC-BY 3.0 license (found in LICENSE from the hbase-server-1.1.3.jar)

in NOTICE:

I don't see any actual inclusion of HBase Shell 1.1.3, HBase IT Tests 1.1.3, or HBase Testing Utility 1.1.3 artifacts. I'm not sure if this is an oversight in the constructed NOTICE or in the bin distribution tarball.

Sean Busbey
added a comment - 04/Jan/17 23:14 The entries for the bundled HBase libraries is slightly incorrect.
in LICENSE:
The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.
There's no entry for the bundled Orca Logo from the HBase Server jar. It's mentioned in NOTICE, but LICENSE should have a complete reference for the CC-BY 3.0 license (found in LICENSE from the hbase-server-1.1.3.jar)
in NOTICE:
I don't see any actual inclusion of HBase Shell 1.1.3, HBase IT Tests 1.1.3, or HBase Testing Utility 1.1.3 artifacts. I'm not sure if this is an oversight in the constructed NOTICE or in the bin distribution tarball.

The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.

I'll be happy to update accordingly, but wanted to make sure - apache licensing seems is saying 'add a pointer', 'short note summarizing', and the example there didn't even mention copyright...

There's no entry for the bundled Orca Logo from the HBase Server jar. It's mentioned in NOTICE, but LICENSE should have a complete reference for the CC-BY 3.0 license (found in LICENSE from the hbase-server-1.1.3.jar)

Copied from there and added to hadoop LICENSE.

I don't see any actual inclusion of HBase Shell 1.1.3, HBase IT Tests 1.1.3, or HBase Testing Utility 1.1.3 artifacts. I'm not sure if this is an oversight in the constructed NOTICE or in the bin distribution tarball.

Good catch, that inspired me to look into the mvn license:aggregate-add-third-party -X -e output from step1. I think we can run with -Dlicense.excludedScopes=test when generating.

Attached patch 4 is based on the new run excluding test scope. Was able to take out a few test-only, not-bundled deps from patch 3.

Xiao Chen
added a comment - 05/Jan/17 06:17 Thanks a lot for the review Sean Busbey !
The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.
jquery actually has been like this long ago... Looking at http://www.apache.org/dev/licensing-howto.html#permissive-deps , is the year required? I'm guessing the current way is written without year because the first 2 are 2005, 2013 and the last is 2012:
...
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/static/jquery-1.10.2.min.js
hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/jquery.js
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/static/jquery
--------------------------------------------------------------------------------
Copyright jQuery Foundation and other contributors, https://jquery.org/
...
I'll be happy to update accordingly, but wanted to make sure - apache licensing seems is saying 'add a pointer', 'short note summarizing', and the example there didn't even mention copyright...
There's no entry for the bundled Orca Logo from the HBase Server jar. It's mentioned in NOTICE, but LICENSE should have a complete reference for the CC-BY 3.0 license (found in LICENSE from the hbase-server-1.1.3.jar)
Copied from there and added to hadoop LICENSE.
I don't see any actual inclusion of HBase Shell 1.1.3, HBase IT Tests 1.1.3, or HBase Testing Utility 1.1.3 artifacts. I'm not sure if this is an oversight in the constructed NOTICE or in the bin distribution tarball.
Good catch, that inspired me to look into the mvn license:aggregate-add-third-party -X -e output from step1. I think we can run with -Dlicense.excludedScopes=test when generating.
Attached patch 4 is based on the new run excluding test scope. Was able to take out a few test-only, not-bundled deps from patch 3.

The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.

Wether or not we include the copyright date, in v4 the jquery LICENSE section still needs to call out that there's a copy of 1.8.3 bundled in the hbase server jar.

It looks like the hbase version changed from 1.1.3 in v3 to 1.2.4 in v4. I don't think there was any substantial LICENSE/NOTICE change between those versions, but I don't have time to confirm ATM. I don't think it's worth holding things up for that; I'll just file a follow-on if I find something.

While reviewing the update for v4, I noticed there's an added blurb for a dependency that's BSD 4-clause. BSD 4-clause is the variant "with advertising clause" that's called out in the legal FAQ as not being category-a. It's not listed as any particular category, and isn't lised by the OSI. We can file a LEGAL asking if it's fine, but I suspect it isn't. Are we sure the version of JDOM we're using is BSD 4-clause? The current version of JDOM uses a one-off license that reads as cat-a to me (possibly calling for a NOTICE inclusion as well as LICENSE).

Sean Busbey
added a comment - 05/Jan/17 10:56
The jQuery entries look to cover things other than the bundling that's in the HBase Server 1.1.3 jar. Also the jQuery Foundation copyright entry is missing any year(s). The bit bundled in the HBase jar is version 1.8.3 with (c) 2012.
jquery actually has been like this long ago... Looking at http://www.apache.org/dev/licensing-howto.html#permissive-deps , is the year required?
I'm guessing the current way is written without year because the first 2 are 2005, 2013 and the last is 2012:
...
hadoop-hdfs-project/hadoop-hdfs/src/main/webapps/ static /jquery-1.10.2.min.js
hadoop-tools/hadoop-sls/src/main/html/js/thirdparty/jquery.js
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/resources/webapps/ static /jquery
--------------------------------------------------------------------------------
Copyright jQuery Foundation and other contributors, https: //jquery.org/
...
I'll be happy to update accordingly, but wanted to make sure - apache licensing seems is saying 'add a pointer', 'short note summarizing', and the example there didn't even mention copyright...
The jQuery license specifically says the copyright notice has to be reproduced, so I'd presume that means the year is a relevant part of that reproduction. It's pretty easy to just list
Copyright 2005, 2012, 2013 jQuery Foundation and other contributors, https://jquery.org
Wether or not we include the copyright date, in v4 the jquery LICENSE section still needs to call out that there's a copy of 1.8.3 bundled in the hbase server jar.
It looks like the hbase version changed from 1.1.3 in v3 to 1.2.4 in v4. I don't think there was any substantial LICENSE/NOTICE change between those versions, but I don't have time to confirm ATM. I don't think it's worth holding things up for that; I'll just file a follow-on if I find something.
While reviewing the update for v4, I noticed there's an added blurb for a dependency that's BSD 4-clause. BSD 4-clause is the variant "with advertising clause" that's called out in the legal FAQ as not being category-a. It's not listed as any particular category, and isn't lised by the OSI. We can file a LEGAL asking if it's fine, but I suspect it isn't. Are we sure the version of JDOM we're using is BSD 4-clause? The current version of JDOM uses a one-off license that reads as cat-a to me (possibly calling for a NOTICE inclusion as well as LICENSE).

You're correct, I was looking at https://github.com/hunterhacker/jdom/blob/jdom-1.1/core/LICENSE.txt and made it 4-clause BSD. But as you said with new additional text this should be considered a one-off license. So, updated it and also had a callout in NOTICE. (JDOM itself doesn't have a notice file, so followed the current style to point to its license+homepage.

Also double checked other new deps' licenses are correct - this can be verified from the spreadsheet's 'Overrides' tab.

Xiao Chen
added a comment - 05/Jan/17 18:33 Thanks for the detailed explanations, Sean.
jQuery
Updated the copyright line to include the 3 years, and added a line for the v1.8.3 in hbase server.
the hbase version changed from 1.1.3 in v3 to 1.2.4 in v4
Yep, this is from YARN-5976 recently committed.
JDOM
You're correct, I was looking at https://github.com/hunterhacker/jdom/blob/jdom-1.1/core/LICENSE.txt and made it 4-clause BSD. But as you said with new additional text this should be considered a one-off license. So, updated it and also had a callout in NOTICE. (JDOM itself doesn't have a notice file, so followed the current style to point to its license+homepage.
Also double checked other new deps' licenses are correct - this can be verified from the spreadsheet's 'Overrides' tab.
Patch 5 attached to reflect the above.

Xiao Chen
added a comment - 06/Jan/17 00:10 Committed (the L&N only patch 6) to trunk. Thanks a lot Akira Ajisaka , Andrew Wang and Sean Busbey for the reviews and help!
I'd still prefer including the scripts
Out of HADOOP-12893 and this, I'm very eager to have the automation done. The scripts won't be lost. Will make sure the follow-on HADOOP-13948 is worked out so no one has to play lawyer for alpha3.