Have your say - License tags

We're thinking of ways to make understanding package licenses easier. One way we could do that is by having tags showing short license names such as 'Apache License', 'MIT License' next to packages. Here are a few related questions I wanted to ask the community.

If you could choose between
a) having a tag on your license, with your license being hosted on nuget.org or
b) having no tag to describe your license, with your license hosted off nuget.org on your own site,

which would you choose and why?

Is there anyone who publishes packages using a well-known license such as 'Apache license' for your package, and does
not want your package described with such a license tag?

Would license tags be helpful? Would anything else be helpful for understanding package licenses?

I've often thought about this idea too for packaging in general. A couple issues that need to be considered:

Licenses must have versions.

Some software components are dual-licensed (or even tri-licensed). For example, at one point Mozilla code was licensed under the MPL, GPL and LGPL.

Different package components might have different licenses. For licensing purposes, it'd be ideal if all the pieces of a package had the same license but I don't think that's an assumption we can make.

I think having a mechanism for standard licence tagging would be useful. E.g. all my open source work to date is licensed under the MIT licence and I'd be more than happy for this to be reflected in the NuGet package meta data. I imagine that the current
method of offline hosting of licenses would need to remain in place for those with more complex needs.

I think tagging a package with a license(-set) makes more sense than the current pointer to a licenseUrl. The license(s) should stick to the package once published, which isn't the case with the current licenseUrl mechanism (the license terms behind
that Url can change after package publication).

This would also still leave room for usage of the <licenseUrl> element, allowing publishers to combine online and offline licenses in case this is needed/desired (which they hopefully explain in a readme.txt). The usage of <licenseUrl> could
be translated into a custom/other licenseTag on the site, linking to the specified URL.

Another scenario came up today: dual-licensed packages. The jQuery package is currently MIT licensed, but it used to be dual-licensed MIT and GPL. In the dual-licensed scenario, it's conceivable that when prompted to accept license terms, the
user could choose which license is being accepted. I'm anxious about this though, as I really want to avoid choices during package installation.

I like the idea of owners tagging licenses. We would really like to
avoid inaccurate or misleading tags which could mislead anyone towards e.g. legal trouble.
Some competing ideas we have considered how to avoid misleading tags are:

Idea 1 - Allow package owner to set tags on their package. Allow anyone to
report misleading tags on packages, and have misleading tags automatically removed.

Idea 2 - Give package owners a feature that lets them choose a license via their nuget package. They are not just
tagging their license with a tag, they are specifically choosing to publish their package with that license, and thereby making a legal decision to license their software with that license for all time. IN this case instead of <licenseTags>Apache;MIT>licenseTags>
you might either get <licensedWith>nuget.org/licenses/Apache;nuget.org/licenses/MIT</licensedWith> (where we recognize some specific set of tags) and publishing a package like this will cause nuget.org to create not just tags, but a full license
agreement page for your package... this is doable, but would require changes to nuspec.

Idea 3 - [note, not a strong contender right now] use well-known license URIs only (such as OSI liense pages, or pages such as mit-license.org) to recognize licenses and tag them automatically. Problems we've already identified with this idea
include
-only 5-10% of current packages already use such a link
-barriers to adoption - some OSI license pages are really 'license templates' where you need to fill in some blanks, so not really that good for linking to.
-automatic tag generation could lead to legal quagmires for nuget.org, since the claim about the license no longer can be said to be generated by the package author.

I assume by Idea 1 you mean this data would only be in the database? I like idea #2 much better. It really should be with the package. When you install the package, and then someone clones your project, everything they need to know is in that file.

Just throwing ideas out there, but a very common convention for indicating a license in source code is to add a LICENSE file. In GitHub repos, you often find them named as:

LICENSE

LICENSE.txt

LICENSE.md

We could extend that convention by adding a license tag to the file name (and allowing multiple different file extensions).

LICENSE.mit.txt

LICENSE.apache.md

Benefits.

Supports dual and multi-licensing.

No need to parse the license text to figure out the license. The owner tags it via the file name.

NuGet.org could host and link to the license files.

These files would ostensibly have the correct info regarding copyright holders etc. So it solves the fact that that license URLs pointing to opensource.org are templated and don't have the correct information.

As far as I know any sort of tagging mechanism, whether automatic or from the package developer, is not legally enforceable if there is an actual license agreement that comes with the software. The tags are a convenience to the end-user but they don't absolve
the end user of ultimately complying with the license.

I really like Phil's idea; it seems very NuGet-y. It also prevents the case of a package creator setting some sort of license information on the package but not including the actual license.

One question to me is how descriptive the license tags in the file name (or wherever they are) will get. I think a license and version is enough but it won't cover all cases. (One example that comes to mind is the OpenSSL exception
to the GPL.) It should be clear to package users that the license information provided by NuGet is, again, just a convenience and isn't legally binding.

I really like Phil's idea too! No nuspec changes required, based on conventions, allows the gallery to serve the file(s), puts the file(s) on the consumer's machine, allows for multiple licenses, and enables the gallery to serve the data. Is there any down
side? I can't think of any.

License.txt in the package is great from working around the nuspec issues, and following existing conventions about distributing licenses, and I really like it. There's one issue remaining with it to discuss which is can we trust the license.txt to match
the license filename enough in order to show it in search results if its user supplied? I'd definitely like the answer to be yes, but I feel the urge to be conservative here.

Maybe a reasonable cop-out to avoid people hurting themselves from trusting the license info in search results would be to display not 'License: GPL v3.0' but instead 'License files: License.GPLv3.txt * (*footnote: license filenames are not a license, only
license file contents determine the license)' [ed: ok might have to work on the wording a bit but something like this...] in search results.

We can maybe even make the filename link to a page which displays the actual text extracted from the package contents.