Solving License Compliance at the Source: Adding SPDX License IDs

SPDX License Identifiers can be used to indicate relevant license information at any level, from package to the source code file level.

Accurately identifying the license for open source software is important for license compliance. However, determining the license can sometimes be difficult due to a lack of information or ambiguous information. Even when there is some licensing information present, a lack of consistent ways of expressing the license can make automating the task of license detection very difficult, thus requiring significant amounts of manual human effort. There are some commercial tools applying machine learning to this problem to reduce the false positives, and train the license scanners, but a better solution is to fix the problem at the upstream source.

Licenses: introduce SPDX Unique Lincense Identifiers
Like many other projects, U-Boot has a tradition of including bigblocks of License headers in all files. This not only blows up thesource code with mostly redundant information, but also makes it verydifficult to generate License Clearing Reports. An additional problemis that even the same lincenses are referred to by a number ofslightly varying text blocks (full, abbreviated, differentindentation, line wrapping and/or white space, with obsolete addressinformation, ...) which makes automatic processing a nightmare.To make this easier, such license headers in the source files will bereplaced with a single line reference to Unique Lincense Identifiersas defined by the Linux Foundation's SPDX project [1]. For example,in a source file the full "GPL v2.0 or later" header text will bereplaced by a single line: SPDX-License-Identifier: GPL-2.0+We use the SPDX Unique Lincense Identifiers here; these are availableat [2].. . .[1] http://spdx.org/[2] http://spdx.org/licenses/

The SPDX project liked the simplicity of this approach and formally adopted U-Boot’s syntax for embedding SPDX-License-Identifier’s into the project. Initially, the syntax was available on the project WIKI and was formalized in SPDX specification version 2.1 “Appendix V: Using SPDX short identifiers in Source Files”. Since then, other upstream open source projects and repositories have adopted use of these short identifiers to identify the licenses in use, including github in its licenses-API. In 2017, the Free Software Foundation Europe created a project called REUSE.software that provided guidance for open source projects on how to apply the SPDX-License-Identifiers into projects. The REUSE.software guidelines were followed for adding SPDX-License-Identifiers into the Linux kernel, later that year.

The SPDX-License-Identifier syntax used with short identifiers from the SPDX License List short form identifiers (referred here as SPDX LIDs) can be used to indicate relevant license information at any level, from package to the source code file level. The “SPDX-License-Identifier” phrase and a license expresssion formed of SPDX LIDs in a comment form a precise, concise and language neutral way to document the licensing, that is simple to machine process. This leads to source code that is easier to read, which appeals to developers, as well as enabling the licensing information to travel with the source code.

To use SPDX LIDs in your project’s source code, just add a single line in the following format, tailored to your license(s) and the comment style for that file’s language. For example:

In addition to U-boot and Linux transitioning to use the SPDXLIDs, newer projects like Zephyr and Hyperleger fabric have adopted them right from the start as a best practice. Indeed, to achieve the Core Infrastructure Initiative’s gold badge, each file in the source code must have a license, and the recommended way is to use an SPDX LID.

When SPDX LIDs are used, gathering license information across your project files can start to become as easy as running grep. If a source file gets reused in a different package, the license information travels with the source, reducing the risk of licence identification errors, and making license compliance in the recipient project easier. By using SPDX LIDs in license expressions, the meaning of license combinations is understood more accurately. Saying “this file is MPL/MIT” is ambiguous, and leaves recipients unclear about their compliance requirements. Saying “MPL-2.0 AND MIT” or “MPL-2.0 OR MIT” specifies precisely whether the licensee must comply with both licenses, or either license, when redistributing the file.

As illustrated by the transition underway in the Linux kernel, SPDX LIDs can be adopted gradually. You can start by adding SPDX LIDs to new files without changing anything already present in your codebase. A list of projects known to be using SPDX License Identifiers can be found at: https://spdx.org/ids-where, and if you know of one that’s missing, please send email to outreach@lists.spdx.org.