Who owns your code and text and who can use it legally? Copyright and licensing basics for open-source

I am not a lawyer (“IANAL” in web-speak); but even if I were, you should take this with a grain of salt (same way you take everything you hear from anyone). If you want the straight dope for U.S. law, see the U.S. government Copyright FAQ; it’s surprisingly clear for government legalese.

What is copyrighted?

Computer code and written material such as books, journals, and web pages, are subject to copyright law. Copyright is for the expression of an idea, not the idea itself. If you want to protect your ideas, you’ll need a patent (or to be good at keeping secrets).

Who owns copyrighted material?

In the U.S., copyright is automatically assigned to the author of any text or computer code. But if you want to sue someone for infringing your copyright, the government recommends registering the copyright. And most of the rest of the world respects U.S. copyright law.

Most employers require as part of their employment contract that copyright for works created by their employees be assigned to the employer. Although many people don’t know this, most universities require the assignment of copyright for code written by university research employees (including faculty and research scientists) to the university. Typically, universities allow the author to retain copyright for books, articles, tutorials, and other traditional written material. Web sites (especially with code) and syllabuses for courses are in a grey area.

The copyright holder may assign copyright to others. This is what authors do for non-open-access journals and books—they assign the copyright to the publisher. That means that even they may not be able to legally distribute copies of the work to other people; some journals allow crippled (non-official) versions of the works to be distributed. The National Institutes of Health require all research to be distributed openly, but they don’t require the official version to be so, so you can usually find two versions (pre-publication and official published version) of most work done under the auspices of the NIH.

What protections does copyright give you?

You can dictate who can use your work and for what. There are fair use exceptions, but I don’t understand the line between fair use and infringement (like other legal definitions, it’s all very fuzzy and subject to past and future court decisions).

Licensing

For others to be able to use copyrighted text or code legally, the copyrighted material must be explicitly licensed for such use by the copyright holder. Just saying “common domain” or “this is trivial” isn’t enough. Just saying “do whatever you want with it” is in a grey area gain, because it’s not a recognized license and presumably that “whatever you want” doesn’t involve claiming copyright ownership. The actual copyright holder needs to explicitly license the material.

There is a frightening degree of non-conformance among open-source contributors, largely I suspect, due to misunderstandings of the author’s employment contract and copyright law.

Derived works

Most of the complication from software licensing comes from so-called derived works. For example, I download open-source package A, then extend it to produce open-source package B that includes open-source package A. That’s why most licenses explicitly state what happens in these cases. The reason we don’t like the Gnu Public Licenses (GPL) is that they restrict derived works with copyleft (forcing package B to adopt the same license, or at best one that’s compatible). That’s why I insisted on the BSD license for Stan—it’s maximally open in tems of what it allows others to do with the code, and it’s compatible with GPL. R’s licensed under the GPL, so we released RStan under the GPL so that users don’t have to deal with both the GPL and a second license to use RStan.

Where does Stan stand?

Columbia owns the copyright for all code written by Columbia research staff (research faculty, postdocs, and research scientists). It’s less clear (from our reading of the faculty handbook) who owns works created by Ph.D. students and teaching faculty. For non-Columbia contributions, the author (or their assignee) retains copyright for their contribution. The advantage of this distributed copyright is that ownership isn’t concentrated with one company or person; the disadvantage is that we’ll never be able to contact everyone to change licenses, etc.

The good news is that Columbia’s Tech Ventures office (the controller of software copyrights at Columbia), has given the Stan project a signed waiver that allows us to release all past and future work on Stan under open source licenses. They maintain the copyright, though, under our employment contracts (at least for the research faculty and research scientists).

For other contributors, we now require them to explicitly state who owns the copyrighted contribution and to agree that the copyright holder gives permission to license the material under the relevant license (BSD for most of Stan, GPL or MIT for some of the interfaces).

The other good news is that most universities and companies are coming around and allowing their employees to contribute to open-source projects. The Gnu Public License (GPL) is often an exception for companies, because they are afraid of its copyleft properties.

C.Y.A.

The Stan project is trying to cover our asses from being sued in the future by a putative copyright holder, though we don’t like having to deal with all this crap (pun intended).

Luckily, most universities these days seem to be opening up to open source (no, that wasn’t intended to continue the metaphor of the previous paragraph).

But what about patents?

Don’t get me started on software patents. Or patent trolls. Like copyrights, patents protect the owner of intellectual property against its illegal use by others. Unlike copyright, which is about the realization of an idea (such as a way of writing a recipe for chocolate chip cookies), patents are more abstract and are about the right to realize ideas (such as making a chocolate chip cookie in any fashion). If you need to remember one thing about patent law, it’s that a patent lets you stop others from using your patented technology—it doesn’t let you use it (your patent B may depend on some other patent A).

Or trademarks?

Like patents, trademarks prevent other people from (legally) using your intellectual property without your permission, such as building a knockoff logo or brand. Trademarks can involve names, font choices, color schemes, etc. The trademark itself can involve fonts, color schemes, similar names, etc. But they tend to be limited to areas, so we could register a trademark for Stan (which we’re considering doing), without running afoul of the down-under Stan.

There are also unregistered trademarks, but I don’t know all the subtleties about what rights registered trademarks grant you over the unregistered ones. Hopefully, we’ll never be writing that little R in a circle above the Stan name, Stan®; even if you do register a trademark, you don’t have to use that annoying mark—it’s just there to remind people that the item in question is trademarked.

Also registration allows “statutory damages” which is hugely problematic… It’s for this reason that pretty much every person in the world commits about $35,000,000 in copyright infringement before lunchtime.

(hint, do you have a song as a ringtone? Every time your phone rings it’s $150,000 in statutory damages for public performance of a copyrighted work… etc)

Andrew has already admitted to bicycling around NY with a radio blaring (for safety, no headphones while biking) this is willful public performance of copyrighted works. If he goes for a 30 minute bike ride on NY streets and plays songs averaging 3 mins long it’s 10 songs, so that’s $1.5 Million in statutory damages. Seriously.

People have been successfully intimidated into settlements, and people have been successfully prosecuted for “Making available” music downloads if I remember correctly (that is, they turned on Napster, no one actually downloaded anything from them, but their IP address was collected by the RIAA and they were prosecuted successfully for infringement)

It used to be a weekly occurrence that some lawsuit was being brought against some single mother of a 13 year old girl or whatever. There were whole blogs devoted to following these legal shenanigans.

Rahul: I am familiar with the case of a piece of open source software with copyright owned by a university being sued by a major publishing corporation with a product with very similar outputs. The case dragged on for several years before the major publishing company finally gave up, although the legal costs to the university were not inconsiderable. [Sorry for the obliqueness here, but I am not sure the details were ever made public.]

IAL, not an IP lawyer. Just for anyone reading this as a primer I would add that if you were employed to produce a work, the employer owns the copyright under the “work for hire” doctrine. People sometimes get confused about this because the rule differs for patents, where you would be the “inventor” unless you expressly assigned that right. In many publishing contexts no assignment of copyrights is necessary.

“And most of the rest of the world respects U.S. copyright law.” Um, no. Most of the rest of the world respects the Berne convention. A lot of places (including the European Union) add copyright lasting 70 years after the creator’s death, which is the same as in the US. But Europe (and many other places) doesn’t respect the US-specific term of 95 years after publication (which can be much more than 70 years after death for works published late in someone’s life or posthumously).

If you’re thinking about copyright of your own work, these distinctions don’t look very important. But for reproductions of classical music, literature, etc, there’s quite a large body of things that are public domain in “most of the rest of the world” but not in the U.S.

(I am not a lawyer either. You can get this information from Wikipedia or many other reputable sources.)

First of all, you can’t copyright a recipe. It’s one of those weird things. That’s because it is just a list of ingredients and the steps. You can copyright a cookbook, which includes all the ancillary content like commentary and images, but not the recipe itself.

Trademark registration is in some ways similar to copyright in the sense that you can still sue for infringement if you don’t have a registered trademark, but if you are registered it shows a specific date and also that you and the trademark office has done some investigation to determine that there was no existing mark in the same areas and that public notice of the pending mark was posted. So if it comes to a fight, it helps. One thing to consider is getting the EU mark if you aren’t already and considering some other locales. Getting a TM in the US has nothing to do with other countries. Many jurisdictions are “first to file” which means that someone could take your name and logo and file for it. Interestingly they can just copy your repo and be distributing downloadable software. I think it is good to put TM on an unregistered mark because that indicates the intent to enforce.

It is driving me crazy right now that someone posted a useful function on RPubs without a license and hasn’t responded to email. It’s derived from GPL code and posting on RPubs is a kind of distribution so it really has to be GPL, but it’s not stated in the file so it really does not fly for anyone else to use it.

IANAL either but I spent a while handling these issues/talking to IP lawyers for an open source project.

RStan’s code doesn’t have to be GPL—it could be BSD licensed depending on how it depends on R.

But the combined product, RStan plus R, is going to be governed by both GPL and the BSD licenses. And RStan’s useless without R. So we thought it’d be easier for everyone to just go with the more restrictive of the two licenses, GPL. Profit has absolutely nothing to do with licensing.

I wouldn’t assume that just because a for-profit company does something, there’s no IP risk. Just look at all the lawsuits floating around Google and Oracle for Java or IBM and SCO for Unix.

Do you distribute this combined product (R + RStan) though? If not — if all that is provided/distributed is RStan itself as a package on CRAN — it need not be GPL, which it seems you agree with, and the statement that RStan “must also be released under the GPL” is incorrect in this case. If the decision is based on GPL being “easier for everyone”, then sure, but that is a different reason not required by the license. By looking at examples, e.g. dplyr is under MIT license and depends on R and other GPL packages, is merely to demonstrate that the degree of latitude here. You’re right there’s still risk, I shouldn’t have mentioned risk. My point is to understand what drives the choice: the license, or other factors.

What does releasing dplyr under the MIT license give dplyr users? That wasn’t a rhetorical question, by the way—I’m just curious. Is there something in there that’s not dependent on R? RStan is a 100% R application and I don’t see what releasing it under a license other than GPL would do for users. We could change the license to BSD if there’s an upside to doing so.

That’s a great question. I don’t know. Only real difference is avoiding copyleft, but given the R environment (R being GPL and also interpreted so you almost have to ship source anyway)… I can’t really see any benefit. I’d be very interested to know why as well. Better ask one of them.