Wikilegal/Database Rights

Note: This page shares the Wikimedia Foundation’s preliminary perspective on a legal issue. This page is not final - if you have additional information, or want to provide a different perspective, please feel free to expand or add to it.

Please Remember - This Is Not Legal Advice!

This page may not be accurate, and may fall out of date over time.

The purpose of these pages is to present the Wikimedia Foundation's perspective on an issue. However, because these pages may be edited and updated by the community, they may not continue to represent the viewpoints of the Wikimedia Foundation.

The legal team can only represent the Wikimedia Foundation on legal matters, so if you feel you need personal legal advice, please contact a lawyer.

Because the legal team represents the Foundation, we cannot provide consultations with community members. Contacting the legal team does not create an attorney-client relationship, or any of the duties that come with such a relationship, such as confidentiality.

Contents

Under the Wikimedia Foundation’s licensing policy, all content hosted by Wikimedia projects is expected to be distributed only under a license compatible with the Free Cultural Work definition. Specific projects, like Wikipedia or Wikidata, may have additional, stricter policies tied to specific licenses, such as Creative Commons BY-SA or CC0. Given the increase in content drawn from databases, complying with these policies may require an understanding how copyright and other laws protect databases and their contents. To help with that, this page provides a general overview of the legal protections available for databases in the US and EU.

From a legal perspective, a database is any organized collection of materials — hard copy or electronic — that permits a user to search for and access individual pieces of information contained within the materials. No database software, as a programmer would understand it, is necessary. In the US, for example, Black’s Law Dictionary defines a database as a "compilation of information arranged in a systematic way and offering a means of finding specific elements it contains, often today by electronic means."[1] Databases may be protected by US copyright law as "compilations." In the EU, databases are protected by the Database Directive, which defines a database as "a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means."

US copyright law protects creative expression. To understand what this means in the context of a database, it is helpful to think about a database as having two components. The first component is the structure and organization of the database, which means the types of data that the author chose to include and how he or she chose to organize it. In copyright terms, this is often referred to as the "selection or arrangement" of the database. The second component is the specific data contained in the database.

A database is protected by copyright when the selection or arrangement is original and creative.[2] The level of creativity required is low, so it doesn’t have to be very creative — as long as the author had some discretion and made some choices in what to include or how to organize it, the database is likely to be protected. For example, a list of "best poems published in 1973" would likely be protected because selecting the "best" poems requires making some choices. A list of “all poems published in 1973” would likely not be, because it doesn’t involve choice, just uncreative analysis of the date of publication. One classic example of a database that is not protected by copyright is a telephone directory. Arranging names, addresses, and telephone numbers of subscribers in alphabetical order is not creative enough to meet even the low threshold required for copyright protection. This is true no matter how much work went into the creation of the telephone directory, or any other database. Copyright law protects the creative expression in a work, not the labor that went into its creation (or the author’s "sweat of the brow" as it’s often referred to in the law).[3]

Databases that meet this low threshold level of creativity are protected as compilations.[4] However, this protection covers only the selection and organization of the data, not the data itself. For the individual pieces of data to be protected, they must independently qualify for copyright protection. In other words, the data itself must qualify as creative expression under the Copyright Act.

For many databases, the data contained in them is not protected even if the database itself is protected as a compilation. For example, factual data is not protected by copyright. This includes dates, names, locations, heights, weights, and other measures or statistics, and many other types of purely factual data. A database containing the dates and locations of important historical events would likely be a protected compilation, because the author made certain choices in deciding which events to include. However, the dates and locations themselves are unprotected facts. It would likely not be a violation of copyright law to extract and use that information, but it would likely be a violation to copy the entire database.

Similarly, titles and short phrases, and lists or tables taken from public sources are generally not protected by copyright. It would not be a violation to extract and use this information, even if the database as a whole is a protected compilation.

In the EU, databases are protected under the Database Directive.[5] The Directive has two purposes. The first is to protect the intellectual creativity involved in the selection and arrangement of the database. In this respect, the Directive is similar to copyright protection in the US, where databases are protected as compilations.

However, the second purpose of the Directive is to provide a sui generis database right[6] that is granted to the person or entity that takes "the initiative and the risk of investing" in "obtaining,[7] verifying or presenting the contents" by deploying "financial resources" or expending "time, effort and energy".[8] In this limited case, EU law differs from US law by protecting the author’s "sweat of the brow".

The sui generis right is infringed when "all or a substantial part of the database" is transferred into another document or database. Unfortunately, the Directive does not define substantial, but does say that "substantial" is "evaluated qualitatively and/or quantitatively". [9] Extracting and using a insubstantial portion does not infringe, but the Directive also prohibits the "repeated and systematic extraction" of "insubstantial parts of the contents of the database".[10]

A wide variety of common data sources qualify as databases under the Database Directive, including hard copy or electronic lists, tables, directories, and archives, and sets of interlinked web pages,[11][12] such as the directory of the French National Assembly. The Directive also applies to government databases containing public data.[13]

However, the Directive only protects databases created by EU citizens or residents, or EU-based corporations.[14] It does not apply to, or protect, databases created outside of the EU. Database rights can also be waived by the creator of the database if he or she so chooses.

It can be difficult to determine whether, or the extent to which, a database or its contents are protected by law. Concepts like creative choice and substantiality are inherently subjective and it is rarely easy to predict how a court might rule under either US copyright law or the EU Database Directive.

Whenever possible, the best course is to use only content that is made available by the author under an open license. In particular, for EU databases, the license should include a license or express waiver of the sui generis database right. In the absence of a license, copying all or a substantial part of a protected database should be avoided. Extraction and use of data should be kept to a minimum and limited to unprotected material, such as uncopyrightable facts and short phrases, rather than extensive text. For EU databases, bots or other automated ways of extracting data should also be avoided because of the Directive’s prohibition on “repeated and systematic extraction” of even insubstantial amounts of data.

↑See Database Directive art. 7(1), (2)(a). The term "quantitatively" "refers to the volume of data extracted from the database and/or reutilised, and must be assessed in relation to the volume of the contents of the whole of that database". The term "qualitatively" "refers to the scale of the investment in the obtaining, verification or presentation of the contents". Case C-203/02, Brit. Horseracing Bd. Ltd. & Others v. William Hill Org. Ltd, 2002 E.C.R. I-10415, §§ 70 et seq.