Database Protection And Access Issues, Recommendations

Patent and Trademark Office Report on Recommendations from the April 1998 Conference on Database Protection and Access Issues

US Patent and Trademark Office Department of Commerce Washington, D.C.

July 1998

I. EXECUTIVE SUMMARY

In the 1991 Feist Publications v. Rural Telephone Service Corp case, the U.S. Supreme Court ended the "sweat of the brow" doctrine that had conferred some degree of copyright protection on non-creative compilations of information. The Feist decision has produced subsequent case law in which databases resulting from a substantial investment have been taken by others to produce competing products; lack of copyright leaves the database maker with no recourse against third party predators -- except what state misappropriation law might offer -- and only limited recourse, based on contract law, against contracting parties. Beginning in the late 1980's, Member States of the European Union (EU) sought to harmonize the copyright laws of their various legal systems. That effort resulted in an awareness that some EU States -- Ireland, the U.K., the Netherlands, and the Nordic countries -- provided greater protection to non-creative compilations than other Member States. Eventually, efforts to harmonize the EU copyright laws for the TRIPS Agreement left the EU without any intellectual property protection for non-creative compilations of data. After considering varied proposals, in March 1996 the EU adopted a Database Directive requiring all Member States to provide a sui generis form of intellectual property protection for databases. The EU Database Directive became the basis for the EU's proposal for a draft international treaty that was submitted to the World Intellectual Property Organization (WIPO). In anticipation of a WIPO Diplomatic Conference in December 1996, and because of substantial concerns about provisions of the EU proposal, the U.S. submitted its own proposal to WIPO. Ultimately, the 1996 Diplomatic Conference focused on copyright and neighboring rights; database protection was left unaddressed. Nonetheless, WIPO established a timetable to resume discussions on database protection in 1998. In the United States, a proposal for sui generis protection was introduced in the House in 1996 by then-Congressman Carlos Moorhead. That proposal generated considerable opposition from the scientific, education, and library communities. In the 105th Congress, Howard Coble, Chairman of the House Subcommittee on Courts and Intellectual Property, introduced H.R. 2652, which would provide a database maker with protection against misappropriation of any substantial part of its database, where such misappropriation would harm the actual or potential market for the database. In hearing in late 1997 and early 1998, scientists and educators -- as well as telecommunications companies -- expressed significant concerns over many aspects of the bill. Nonetheless, on May 19, H.R. 2652 passed the House on voice vote from the suspension calendar. Recently, a corresponding bill was introduced in the Senate (S. 2291); as of the time of this memorandum, S. 2291 was co-sponsored by Senators Grams, Cochran, Faircloth, and Helms. In an effort to help policy makers understand the concerns of all parties, the Patent and Trademark Office ("PTO") held a one day conference on database protection and access issues ("PTO Database Conference") on April 28, 1998. At that time, H.R. 2652 had only been approved by the House Judiciary Committee. The conference was held at the Brookings Institution and attracted over 175 attendees representing academia, the business community, libraries, government, non-profits, and the scientific community. The conference did not -- and was not expected to -- produce consensus on any issues, including the most fundamental issue of whether or not database protection is needed. We believe, however, that the proceedings helped (a) initiate dialog, then and subsequently, between various parties, and (b) helped identify areas where disparate interests may be accommodated through further legislative developments. After reviewing the conference proceedings, we believe that the Administration should be willing to support database protection legislation that meets five widely-supported principles: 1. A change in the law to protect commercial database developers from Warren Publishing -like situations is desirable. 2. Consistent with Administration policies, databases generated with Government funding should not be placed, de jure or de facto , under exclusive control of private parties. 3. Any database protection regime must carefully define and describe databases and prohibited acts, so as to avoid unintended consequences, including undue disruption of existing business relationships and non-profit research. 4. Any database protection regime should be subject to exceptions largely co-extensive with "fair use" principles of copyright law. 5. Consistent with U.S. trade policy, it is desirable to secure for U.S. companies the benefit of the EU Database Directive and laws in other countries protecting database products. This document provides a brief summary of the April 28 conference; our analysis of these principles both generally and as they relate to H.R. 2652; and a few areas where we believe further work may be needed to be produce an acceptable legal regime for databases.

II. THE PTO DATABASE ISSUES CONFERENCE

The PTO Database Conference was held on April 28, 1998 at the Brookings Institution in Washington. Preparations for the conference began in late January. The format of the conference was a series of plenary sessions with mid-day "breakout" sessions devoted to more specialized topics. In planning conference topics and possible panelists, we reviewed all testimony given before the House Subcommittee on Intellectual Property in its hearings in October 1997 and February 1998. We also met with representatives of the Information Industry Association (IIA), the Information Technology Association of America (ITAA), and the National Research Council (NRC). We had on-going discussions with representatives of these organizations as well as conversations with the American Library Association (ALA), the Association of American Publishers (AAP), the Association of Research Libraries (ARA), and the Business Software Alliance (BSA). The 23 panelists and moderators consisted of 18 Americans and 5 Europeans. These included seven legal or economic academics (divided roughly equally between supporters and critics of database protection proposals); six scientists and representatives of scientific organizations; two library representatives; and five business groups. The conference panelists/moderators also included representatives of the State Department, the Copyright Office, and the European Commission. Approximately 175 people attended one or more sessions of the conference. In addition to many people from trade associations and Washington law firms, participants included: from the scientific community and related government agencies, representatives of the Centers for Disease Control, Chemical Abstracts Service, the House Science Committee, the National Science Foundation, the National Research Council, OSTP, the State Department, and the U.S. Geological Survey; from the private sector, representatives of ABC Cable and News Media, BellSouth, Dun & Bradstreet, Eli Lilly, Fujitsu, IBM, Intermetrics, Lexis-Nexis, McGraw-Hill, Reuters, MCI and several smaller businesses, including information firms for realtors and insurance companies; from non-profit organizations, representatives of the Modern Language Association, the Church of Jesus Christ of Latter Day Saints, and National Public Radio. The conference had four plenary sessions with seven mid-day "breakout" discussion groups. Plenary session topics reflected neutral statements of general issues that have arisen repeatedly in Congressional hearings and scholarly writings on database protection and access issues; several "breakout" sessions were dedicated to thorny issues identified by specific groups. The first plenary panel discussed whether there is need for additional database protection; the second plenary was devoted to the concerns of the scientific and research communities; and the third plenary session explored the "fair use" needs of libraries, non-profit entities, and database producers who rely on government data. The fourth plenary session consisted of reports to the Assistant Secretary of discussions in the mid-day break-out sessions. Attachment A is a program of plenary topics and breakout sessions from the conference. Individuals interested in obtaining copies of videotapes of the first and fourth plenary sessions (the only sessions recorded) may do so for the cost of reproduction by contacting Justin Hughes, Office of Legislative and International Affairs, Patent and Trademark Office, Department of Commerce, Washington, D.C. 20231, justin.hughes@uspto.gov.

III. EMERGING PRINCIPLES AND ISSUES

A. BASIC PRINCIPLES

In light of the conference proceedings and after reviewing Congressional testimony, scholarly writings, and reports on these issues, we believe that a set of principles emerge that should shape the administration's position on database protection. This principles could be embodied in any number of approaches, including H.R. 2652 with appropriate modifications to reflect these goals. After listing the principles, we discuss each principle and analyze how H.R. 2652 fulfills or fails to achieve those goals.

1. A change in the law to protect commercial database developers from Warren Publishing-like situations is desirable.

2. Consistent with Administration policies, databases generated with Government funding should not be placed, de jureor de facto , under exclusive control of private parties.

3. Any database protection regime must carefully define and describe databases and prohibited acts, so as to avoid unintended consequences, including undue disruption of existing business relationships and non-profit research.

4. Any database protection regime should be subject to exceptions largely co-extensive with "fair use" principles of copyright law.

5. Consistent with U.S. trade policy, it is desirable to secure for U.S. companies the benefit of the EU Database Directive and laws in other countries protecting database products.

The discussion which follows elaborates on each of these principles.

1. A change in the law to protect commercial database developers from Warren Publishing-like situations is desirable.

There was considerable, albeit not complete, consensus at the conference that some type of legislative "fix" would be reasonable to provide commercial database producers with protection for their products. This has been stated by leading scientists and by legal scholars identified as critics of database protection. A handful of people remain who insist that case law will develop and/or that a combination of technology and contract, copyright, and trade secrecy law offer database producers sufficient incentive. But on the whole, there seems to be agreement that situations like Warren Publishing and the ProCD case are likely to arise in digital commerce and that some protection in such situations is desirable. A recent report by the Japan Institute of Intellectual Property reaches the same conclusion:

"In today's society the database industry has proved to be of vital support for governmental, educational, and commercial purposes. Since databases are plainly open to full-scale misappropriation a lack of adequate legal protection obviously could have a range of damaging effects on the everyday life of society. . . . Once disclosed to the public, information can be used generally speaking and leaving aside contractual or tortious liability, freely without the database provider's permission or an obligation to reimburse him for his investment. This holds equally true for the off-line as well as the on-line market."

While the NRC principally advocates scientists' concerns and believes science has a specific, public-minded paradigm for data-gathering, in their seminal study of database issues, Bits of Power , they recognized the problem existing in the commercial sector:

"In the private sector, by contrast, commercial compilers of data have long suffered from a risk of market failure owing to the intangible, ubiquitous, and above all, invisible nature of information goods and the ease with which free riders may have appropriated the fruits of the compilers' investment once the information goods were made available to the public in print media."

These sorts of cases are only likely to increase with digital media. The ProCD v. Zeidenberg case provides an example of a fact pattern that may become commonplace without appropriate legal safeguards. In ProCD , defendant Matthew Zeidenberg purchased ProCD's CD-ROM database of 3,000 telephone directories from around the country. He then formed a company to sell the telephone directory information online -- for far less than the price for the CD-ROM set. ProCD prevailed in this case at the appellate level because the Seventh Circuit panel ruled that the "shrink-wrap" license which limited the defendant to non-commercial use of the CD-ROMs was enforceable. In a case where Zeidenberg gave the CD-ROM set to someone else, who later started the same company, ProCD would have had no privity of contract against the defendant company and would have lost control of its database. Similarly, in Warren Publishing v. Microdos Data Inc. , Warren Publishing's "Directory of Cable System" classified cable television systems classified by the principal communities they served. The directory was apparently taken and reproduced by Microdos Data in a competitor product sold in software format. The Eleventh Circuit, sitting en banc, ruled that there were no copyrightable aspects to Warren Publishing's database that had been taken by the defendant. The database protection regime set out in H.R. 2652 would clearly meet the goal of addressing these situations. At the same time, this goal could probably be met with a modified " NBA v. Motorola " approach (as amended by suggestions of Professors Ginsburg and Reichman) built on the elements of a misappropriation claim being: (i) the plaintiff generates or collects information at some expense, (ii) the defendant's use of the information constitutes free-riding on the plaintiff's costly efforts to generate or collect it, (iii) the defendant's use of the information is in competition with a product or service offered by the plaintiff or likely to be offered by the plaintiff, and (iv) the ability of other parties to free ride on the efforts of the plaintiff would so reduce the incentive to produce the product or service that the existence or quality of the product would be substantially threatened. At the same time, we think that these are largely the principles that govern H.R. 2652. Where H.R. 2652 diverges from an NBA v. Motorola model, there may be good reasons. Some participants at the conference also raised concerns about the constitutionality of different database protection proposals. We believe that there are two principal concerns. The first is whether the Supreme Court's interpretation of the Intellectual Property Clause (Article I, Section 8, Clause 8 ) as set forth in Feist pre-empts Congressional exercise of Commerce Clause power to legislate in this area under the doctrine of Railway Labor Executives' Ass'n v. Gibbons ("Clause 8 pre-emption"). Given Congress's creation of discrete intellectual property rights in areas previously treated as related to copyright or patent (trademark, semiconductor mask protection) and the Supreme Court's continued recognition of "non-copyright grounds" for protection of information, we believe that a database protection bill can be properly crafted to avoid Clause 8 pre-emption. The second concern is what limits the First Amendment imposes on any database protection regime. This is not a new problem; courts have frequently dealt with the relationship between trademark law and the First Amendment, copyright law and the First Amendment, and trade secrecy law and the First Amendment. All of these laws limit "speech" in which citizens may engage but remain, nonetheless, compatible with the First Amendment. We believe that First Amendment concerns can be addressed as long as any database protection regime (a) permits unhampered independent collection of information, (b) permits use of data for criticism, news reporting, and de minimis personal communications, and (c) recognizes a wide berth of "fair" uses that do not substantially affect the commercial activities of the database owner. We understand that the Department of Justice's Office of Legal Counsel is in the process of preparing a preliminary analysis of constitutionality issues concerning H.R. 2652; we look forward to reviewing this preliminary analysis.

2. Consistent with Administration policies, databases generated with Government funding should not be placed, de jureor de facto , under exclusive control of private parties.

There seems to be general agreement that compilations of data generated with U.S. Government funding should not be subject to any protection regime. There are several reasons for this. First, if U.S. Government-funded databases were subject to some type of protection regime, taxpayers might "pay twice" for access to data. Second, the principal argument for a protection regime is that, absent such protection, private parties will lack adequate incentives for database production. But government funding provides the incentive in the case of publicly-financed compilations, such as weather information, census data, and medical studies funded by NIH grants. As the Office of Management and Budget has stated:

"Government information is a valuable national resource. It provides the public with knowledge of the government, society, and economy -- past, present, and future. It is a means to ensure the accountability of government, to manage the government's operations, to maintain the healthy performance of the economy, and is itself a commodity in the marketplace."

For many government agencies, the responsibility to make government-generated information widely available is a statutory obligation. For example, the Agriculture Department works under a wide directive to "diffuse among people of the United States, useful information on subjects connected with agriculture . . " (7 U.S.C. section 2201) while statutes such as the Freedom of Information Act and the Government in the Sunshine Act "establish a broad and general obligation on the part of Federal agencies to make government information available to the public and to avoid erecting barriers that impede public access."

a. A Wide Definition of Government Data

While there is wide agreement on this general proposition, some questions have been raised whether data generated by the government (for example, from government-owned satellites) is distinct from data generated by non-government entities funded by the government (for example, private researchers working with NIH grants). We believe that even if it were desirable to draw a distinction of this sort , no statutory language could adequately capture this distinction, particularly in a time when efforts to "reinvent" government may lead to private parties gathering datasets under government contracts that might have been gathered previously by government employees. For example, many private contractors participate in gathering data for the decennial Census; the individuals who work for these private entities are sworn in as "special census employees" only for purposes of statutory confidentiality requirements and are not federal employees under Title 5 of the U.S. Code. H.R. 2652 presently addresses this issue with the following broad section 1204(a) exclusion:

"Protection under this chapter shall not extend to collections of information gathered, organized, or maintained by or for a government entity, whether Federal, State, or local, including any employee or agent of such entity, or any person exclusively licensed by such entity, within the scope of the employment, agency, or license. Nothing in this subsection shall preclude protection under this chapter for information gathered, organized, or maintained by such an agent or licensee that is not within the scope of such agency or license, or by a Federal or State educational institution in the course of engaging in education or scholarship."

We believe that this provision serves the general policy goal of making all forms of government information available to the public, but we believe that the language can be improved. In response to concerns raised as the "publicly-funded data" breakout session about the different government contractual arrangements with laboratories and private companies, we suggest that the drafters of H.R. 2652 should examine existing definitions of "government information" for descriptions that capture a fuller range of government-sponsored data collection. For example, OMB Circular A-130 states that "the definition of 'government information' includes information created, collected, processed, disseminated, or disposed of both by and for the Federal Government." At a minimum, we are concerned that the present language does not adequately cover situations in which the government contracts for information gathering. It was pointed out at the conference that government contracts sometimes expressly preclude the private entity from being an "agent" or "licensee" of the government -- thus removing their activities from the ambit of section 1204(a) as presently written. One way to address this would be inclusion of "contracting for the government" language. Another possibility would be inclusion of statutory language that the 1204(a) exclusion also applies to data gathering "funded by the government" in section 1204(a) and discussion in the legislative history to make it clear that section 1204(a) applies to databases developed by a private entity as a necessary part of a government-funded contract, whether or not "gather[ing], organiz[ing], or maintain[ing]" a collection of information was the purpose of the government contract. For example, if a company working on airport safety under contract from the FAA builds a database of airport characteristics that is required to complete its contract with the FAA, then the company should not be able to assert any exclusionary rights over the airport database. It may be possible to develop standards for when a database is necessary for a government contract from existing standards for when government agencies must collect data. In any case, the same rationales apply to government contracting as to data generated by the government itself: government funding already provides an adequate incentive and there is no reason taxpayers should pay 'twice' for data gathering. The distinction which need to be drawn is between (a) compilations of data made as a necessary element of a Government-funded activity, and (b) compilations of data made by private entities over and above the activity being funded. This appears to be the intent of the section 1204(a) language that:

"Nothing in this subsection shall preclude protection under this chapter for information gathered, organized, or maintained by [a government] agent or licensee that is not within the scope of such agency or license . . ."

This appears to protect other activities of a government licensee and to permit protection of value-added databases that the licensee generates from government data. Nonetheless, we think that this section could be clarified by express language (or discussion in the legislative history) that transformative developments from government compilations of data can be protected, i.e. that value-added activities outside the ambit of a government contract can produce protected databases, subject to the general principle -- drawn from copyright law -- that where government-funded data and value-added data are commingled and the government-funded data predominates, then the private data producer should take affirmative steps to distinguish the two types of information.

b. State and Local Government Data

Another minor question in this area has been whether data generated from funding by state or local governments should be treated differently than data generated from funding by the Federal Government. The above language takes the position that data generated with funding from any level of government, federal, state, or local, may not take advantage of the H.R. 2652 database protection regime. The Committee Report notes that this "exclusion is broader than the similar provision in section 105 of the Copyright Act" in that it applies to state and local governments. This raises some interesting questions. Given the rationale that taxpayers should not "pay" for databases twice, this does create the possibility that, for example, a database whose creation was funded by the California state government will be used by private citizens of Arizona -- giving the Arizonans a free-ride on the California taxpayers' investment. Nonetheless, we agree with the Committee's approach because of the importance of developing a strong, clear principle that government-generated data is not subject to exclusion.

c. University Generated Databases

Section 1204(a) is currently worded to ensure that data gathered by state-funded colleges and universities may enjoy 2652 protection:

"Nothing in this subsection shall preclude protection under this chapter for information gathered, organized, or maintained by such an agent or licensee that is not within the scope of such agency or license, or by a Federal or State educational institution in the course of engaging in education or scholarship."

According to the Committee Report, "educational institutions that happen to be government owned should not be disadvantaged relative to private institutions when producing databases unrelated to the provision of regulatory government functions." This is a topic where guiding principles may conflict. What happens with a database gathered by medical researchers at a state university working under a federal grant from NIH? Should this be excluded from 2652 protection on the ground that it is government-funded research (and data for which the American public has already paid)? Or should the database be eligible for 2652 protection on the grounds that it comes from "a Federal or State educational institution in the course of engaging in education or scholarship" and the principle that state-funded schools should not be prejudiced against private universities? Administration policies clearly establish that the U.S. Government has a right to disseminate data produced by any federal grant to institutions of higher education, hospitals, and non-profit research organizations. OMB Circular A-110 states the general framework, including the U.S. Government's right to a "royalty-free, non-exclusive and irrevocable" license to any copyright and, concerning compilations of information: (c) Unless waived by the Federal awarding agency, the Federal Government has the right to (1) and (2): (1) Obtain, reproduce, publish or otherwise use the data first produced under an award. (2) Authorize others to receive, reproduce, publish, or otherwise use such data for Federal purposes. In keeping with this policy and our belief that Government-funded data should not be subject to 2652 protection, we believe that databases resulting from research directly funded by the government, whether generated by a for-profit entity or a non-profit entity, should be ineligible for 2652 protection. No distinction should be drawn between the research being funded at Sloan-Kettering, Harvard, Michigan State, or a Kaiser Permanente hospital as long as the research is directly funded by the government. On the other hand, we think that a professor working at a state university without any government grant beyond her state university salary and laboratory funds should be able to apply 2652 protection to a database resulting from her work. This would address what might otherwise be an inequitable situation between private institutions like Amherst College and USC versus state institutions like the University of Massachusetts at Amherst and UCLA. It may be difficult to craft statutory language that absolutely resolves this problem, but we believe this should be thoroughly addressed in the legislative history. The issue can and should be expressly addressed in government grants.

d. Realistic Government Action in a H.R. 2652 Environment

All parties should recognize that § 1204(a), whether as currently worded or amended along the lines suggested, will require diligence on the part of government contracting agents to ensure that delivery of data (in a reasonable form) to the public is part of the described government-funded activity. Otherwise, licensees could argue that the form in which they were making the data available to the public was a value-added format and "outside" the scope of their government contract. We think that any future legislative report should be clarified on this count: that when the government contracts with a private firm to produce data, usually the goal is to not only produce data, but also to make that data reasonably available to the relevant public in at least raw form. At the same time, we think that the discussion about database protection and the need to keep government-generated data in the public domain has ignored one fact: that the U.S. Government has already undertaken some programs intended to generate scientific data and not place it in the public domain. For example, the Sea-viewing Wide Field-of-view Sensor ("SeaWiFS") is a "cost-sharing collaboration" between NASA and Orbital Sciences Corporation (OSC) "wherein NASA's Goddard Space Flight Center . . . specified the data attributes and bought the research rights to these data" while "OSC provided the spacecraft, instrument, and launch" and retains "the operational and commercial rights to these data." The Space Commercialization Act is a broader example of government/private sector collaboration in which the government partially funds research efforts conscious that the resulting data will be commercialized. Federal agencies are under direction to ensure that "information systems do not unnecessarily duplicate information systems available . . . from the private sector. For example, NOAA buys substantial amounts of data from private entities and negotiates the terms for data usage in such buys. How would §1204(a) relate to these efforts? There are two possible, alternative answers. First, it would be credible to take the position that while the Government may engage in collaborative programs with private entities, both the Government and the private entities do so without the benefit of any database protection law, i.e. the results of its collaborative projects with private industry can be protected by any of the means now available -- technological means of controlling access, contract law, etc -- but not by the new law. This would suggest that the first sentence of §1204(a) should be written to "govern" all public/private joint ventures. The second alternative is to say that the second sentence of §1204(a) governs: depending on how the government/private entity contract is crafted, certain uses of data can be outside the government license, contract, or agency, such that a private company like OSC can enjoy database protection rights. Given the existence of the Space Commercialization Act, we think that a final resolution among these two alternatives is a broader question than H.R. 2652. Our hope is that H.R. 2652 will be compatible with either view.

3. Any database protection regime must carefully define and describe databases and prohibited acts, so as to avoid unintended consequences, including undue disruption of existing business relationships and non-profit research.

Defining a database or "compilation of information" is one of the most daunting tasks in drafting any database protection or access law. We believe that a database protection law should exclude the following from the ambit of protection: (a) audio-visual works, despite the fact that they are arguably "compilations" of film frames; (b) narrative texts, whether fiction or non-fiction, regardless of length, despite these being "compilations" of words; and (c) pieces of music, whether in sheet music or recorded performance form, despite these being "compilations" of chords, lyrics, musical notes, etc. We are also unsure that the present bill adequately addresses concerns about datasets embedded in the nation's telecommunications infrastructure. This challenge of defining "compilations of information" is one area where we believe there is room for improvement of H.R. 2652, either in the statutory language or in legislative history which can clarify Congress' intent. At present, H.R. 2652 defines a compilation of data as follows:

"1201" As used in this chapter:

"(1) Collection of information. -- The term 'collection of information' means information that has been collected and has been organized for the purpose of bringing discrete items of information together in one place or through one source so that users may access them

"(2) Information. -- The term 'information' means facts, data, works of authorship, or any other intangible material capable of being collected and organized in a systematic way."

And provides the following legislative report on the subsection: "Section 1201 . . . defines 'collection of information' . . . . The definition is intended to avoid sweeping too broadly, particularly in the digital environment, where all types of material when in digital form could be viewed as collections of information. It makes clear that the statute protects what has been traditionally thought of as a database, involving a collection made by gathering together multiple discrete items with the purpose of forming a body of material that consumers can use as a resource in order to obtain the items themselves. This is in contrast to elements of information combined and ordered in a logical progression or other meaningful way in order to tell a story, communicate a message, represent something, or achieve a result. Thus, a novel would not be considered a 'collection of information' even if it appears in electronic form, and therefore could be described as made up of elements of information that have been put together in some logical way. Similarly, materials such as interface specifications would not ordinarily be covered, although a collection of such specifications created in order to provide consumers access to the individual specifications could be covered."

In terms of the general definition, we think that this present language takes a viable approach, but that it can be improved. For example, the EU Directive differs from the present definition in H.R. 2652 in requiring that the information be "arranged in a systematic or methodical way and individually accessible by electronic or other means." [Article 1(2)] The problem with the EU definition is that single frames of films and specific parts of songs are already "individually accessible" and will become more so with increasing digitization; we think that the true difference between a database and, on the other hand, a film or song is that the elements of a database are intended to be accessed individually. They are also intended to be accessed in sets and subsets, as when one uses a column of information in a spreadsheet database. This suggests definition of a compilation based on the intention that elements be accessed in a particular way: a database is information collected for the purpose of allowing users to access items of information both individually and in sets or subsets of related items of information . We understand that this may have been the intent of the 1201(1) language that a collection of information is "information that has been collected . . . for the purpose of bringing discrete items of information together in one place or through one source so that users may access them" but the language could more clearly convey this intention by shifting where the "purpose" is located and introducing the notion of accessing data individually or in sets, i.e. a collection is "information that has been collected . . . in one place or through one source for the purpose of allowing users to access items of information both individually and in sets or subsets of related items of information." We believe, however, that no abstract definition of a database will give us a bright line border between databases and non-database works. Therefore, we think that clear legislative history on this question is especially important. For example, where the current legislative report gives the example that "a novel would not be considered a 'collection of information' even if it appears in electronic form . . . " we think that the legislative history should enumerate several examples of work with a "logical" or "linear" progression (or a representational nature) that are not intended to be protected as databases: audio-visual works, video games, computer software code, fictional narrative texts, non-fictional narrative texts, and photographs. We think that the single example of a fiction novel in the present legislative report is especially troublesome because it does not sufficiently clarify the important point that a non-fiction narrative text should also fail to qualify as a database. A second area of concern with the current definition of a database relates to computers and the Internet. The statute expressly states in §1204(b)(2) that "[a] collection of information that is otherwise subject to protection under this chapter is not disqualified from such protection solely because it is incorporated into a computer program." Read by itself, this strongly suggests that all databases in computer programs are protected. Many such embedded databases are not intended for human perception; we believe that these databases should be protected on a "sweat of the brow" justification to avoid situations in the future in which competitors steal significant unprotected value-added from software makers. This appears to be something the House Subcommittee did not fully consider. (There was no testimony before the Subcommittee on this subject during its two days of hearings.) While we believe that protection should be afforded to datasets built into software and made through substantial investments, regardless of whether they are "accessed" by humans or not, there seems to be some equivocation on the bill and its legislative report. First, the definition of a "collection of information" in §1201(1) speaks of information arranged "so that users may access them." On the one hand, this ambiguous term coupled with the software inclusion provision of §1204(b)(2) would suggest that non-human "users" might qualify. On the other hand, the legislative report states:

". . . material such as interface specifications would not ordinarily be covered, although a collection of such specifications created in order to provide consumers access to the individual specifications could be covered." [discussion of §1201]

The use of "consumers" in this phrase suggests a human-use standard is intended, but this is not clear. We agree that the "interface specification" problem should be resolved as the legislative report states, but we also believe that the software-embedded database problem apparent in the Gates Rubber opinion should be resolved favorably for the parties investing in these databases; this would, at a minimum, suggest different language in the legislative history.

4. Any database protection regime should be subject to exceptions largely co-extensive with "fair use" principles of copyright law.

There seems to be general agreement that any database protection regime should be subject to exceptions with approximately the same scope as copyright "fair use." Some critics would call for exceptions with at least the same scope as "fair use." The most significant detractors from this view are those who argue that such discussions of fair use demonstrate that any database protection regime is actually a copyright law -- lurking under a different label and forbidden by the Supreme Court's ruling in Feist.

A. The 1203(d) Exception

H.R. 2652 does not provide exceptions from liability parallel to those in the copyright law. Some take the position that the bill's exceptions are not as broad as copyright fair use; some argue that the bill gives broader exceptions. We think this issue merits further attention. The main exception from liability provided by the bill is § 1203(d) which provides as follows:

"(d) Nonprofit Educational, Scientific, or Research Uses. -- Nothing in the chapter shall restrict any person from extracting or using information for nonprofit educational, scientific, or research purposes in a manner that does not harm the actual or potential market for the product or service referred to in section 1202."

We agree that this language does not solve the problem of databases actually developed for scientists or researchers. At least one representative of the scientific community at the April 28 conference has further criticized this proposal as "illusory"; we understand this criticism, i.e. that § 1203(d) really adds nothing to § 1202. But we believe that the § 1203(d) language recognizes a wide range of exceptions. For example, § 1203(d) would permit the following research uses: + A statistician uses lists from the AMA's directory of physicians and the Martindale-Hubbell directory of attorneys to do a statistical analysis of the distribution of recently graduated medical specialists correlated to different legal specialties, particularly personal injury lawyers, among major metropolitan areas; + A sociologist reproduces some of Warren Publishing's list of cable operators in a book on the effects of mass media in America; + A statistician and an economist reprint sections of Phillips Business Information's Electronic Commerce Directory and Canadian Electronic Commerce Directory in their comprehensive study of e-commerce developments in NAFTA countries; + A biologist specializing in mammalian metabolism integrates drug testing data from a study done and publicized by a pharmaceutical company (to promote the efficacy of its drug) in her scholarly analysis of mammal reactions to certain chemical compounds. + A medical researcher uses grocery shopping data generated from checkout scanning equipment in supermarkets (which is marketed back to supermarkets and to food companies) to study the possible effects of consumption patterns on cancer rates. One concern is that businesses will try to define their "actual" and "potential" market broadly to include these research uses, either in litigation claims or (for the far-sighted party) in their business plan for any new database. We think this is a possibility, but not a great danger. As with any legislation, some private parties will try to manipulate their behavior to gain undue advantage from statutory language and courts must curb such activities. We think that this concern about harm to a "potential" market for a database can be addressed through some improvement of § 1203(d) discussed below. Given the amount of discussion at the conference on fair use, it is worthwhile to directly compare how § 1203(d) and other exculpatory provisions of H.R. 2652 would work in comparison to copyright's principal fair use provision, 17 U.S.C. § 107. Section 107 states that "fair use" is the use of copies "for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research . . . ." But not all uses in these categories are fair uses; instead a court must consider four factors:

"(1) the purpose and character of the use, including whether such use is of commercial nature or is for nonprofit educational purposes;

"(2) the nature of the copyrighted work;

"(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

"(4) the effect of the use upon the potential market for or value of the copyrighted work."

Initially, it should be noted that § 1203(d) of H.R. 2652 offers a stronger exception than 17 U.S.C. §107 because §1203(d) is absolute -- if a party falls into its description, the exception applies. In contrast, 17 U.S.C. §107 requires a court to weigh the four factors, so a use that falls in the "teaching" or "research" description may still infringe. Of the four fair use factors, §1203(d) already addresses "(1)" by stating that the present exclusion applies to "nonprofit educational, scientific, or research purposes". There may be some question whether the word "nonprofit" modifies only "educational" or modifies all three adjectives "educational, scientific, or research" The legislative report sheds limited light on this point, particularly because it uses the same grammatical construction twice. The report does say that §1203(d) is intended to "alleviate concerns expressed by members of the research, scientific, and university communities"; since none of those concerns have been expressed by for-profit researchers, we take §1203(d) to refer to nonprofit activities, whether educational, scientific, or research. We think that for-profit research, as in research laboratories at companies like Amgen, IBM, or Ford, would fall outside the ambit of §1203(d). Opponents of the legislation have criticized H.R. 2652 for not including the second and third of the four § 107 fair use factors:

"(2) the nature of the copyrighted work;

"(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole

Supporters of the bill have responded that these criteria are already built into the H.R. 2652 framework and do not need to be restated in the exception(s). As concerns factor (2) of §107 -- which calls for consideration of the "nature of the copyrighted work" -- proponents of H.R. 2652 argue that since it only covers databases, a court enforcing H.R. 2652 would not need to engage in the same type of "nature of the . . . work" analysis. We generally agree that a court enforcing a law modeled on H.R. 2652 would not face the wide range of works that are copyrightable -- from feature films to sculptures to non-fiction scholarly articles. A court enforcing a law modeled on H.R. 2652 would not face what is arguably the principle distinction in § 107(2) analysis: whether the work is fictional or factual. At the same time, we recognize that there will still be significant variations in the kinds of databases that would be subject to this law. We address this point in the "balancing" discussion below. Similarly, the bill's proponents have pointed out that the third fair use factor of §107 calls for analysis of the "substantiality" of the infringement and that H.R. 2652 largely achieves the same effect by creating liability only if there has been a "substantial" taking of the database. Many critics of H.R. 2652 at the conference seemed to prefer a balancing test that would allow courts to consider degrees of substantiality in the taking. The fourth fair use factor under copyright law is the effect of the use on the "market" for the protected work. Supporters of H.R. 2652 say that it provides this test because §1203(d) shields researchers from liability unless there is "harm" to "the actual or potential market for the product or service referred to in section 1202." On this point, we believe that the § 1203(d) exception could be improved by inserting "substantially" or a similar standard before "harm" so that any person may extract or use "information for nonprofit educational, scientific, or research purposes in a manner that does not substantially harm the actual or potential market for the product or service referred to in section 1202." Such a "substantial harm" standard is familiar to courts; would focus judges on the primary market for a database; and, in the face of a database owner contending that "science" or "research" were his intended markets, would tend to exculpate researchers who used the database. Another possibility would be an exemption "for nonprofit educational, scientific, or research purposes in a manner that does not unreasonably harm the actual or potential market for the product or service referred to in section 1202." This test follows the spirit of Article 9(2) of the Berne Convention that exemptions from copyright protection are permitted which do not "unreasonably prejudice the legitimate interests of the author." Yet another option that might be considered would be to expose nonprofit researchers and scientists to liability only for harm to an actual market and eliminate any potential liability for effects on "potential" markets. There is a reasonable basis for drawing this distinction: commercial actors are more likely to know the potential market of a competitor through market research and business planning than nonprofit actors who are not market participants. H.R. 2652 also provides exceptions for "extracting information . . . for the sole purpose of verifying the accuracy of information independently gathered, organized or maintained by that person" [§ 1203(c)] and for "extracting or using information for the sole purpose of news reporting, including news gathering, dissemination, and comment," unless the information has been gathered by a news agency for a like purpose -- an exception to the exception intended to capture the INS case [§ 1203(e)]. The bill also includes an express protection for independent gathering of the same data [§ 1203(b)]. In general, we believe that these are all reasonable, appropriate, and in the spirit of fair use. To parallel First Amendment concerns manifest in copyright law, H.R. 2652 could include an express exception for "criticism" similar to the existing § 1203(e). Finally, even if the § 1203(d) exception largely captures the substantive content of § 107 of the Copyright law, one of the concerns repeatedly expressed at the April conference was that H.R. 2652 does not include a "balancing" mechanism to give judges more leeway in determining what uses of compilations of data should be shielded from liability. We would not be opposed to the addition of a balancing mechanism in H.R. 2652 that indicated to judges that they could exercise more leeway in considering the "nature" of the work and the "amount" of the copying above "substantiality" in determining what kind of liability a non-profit "educational, scientific, or research" entity should face; the possibility of such a revision, however, turns on a clear understanding of how the remedies provisions of H.R. 2652 function.

B. Remedies-Delineated Exceptions A second area where the fair use-like elements of H.R. 2652 might be clarified or strengthened for the benefit of nonprofit researchers and educators is the bill's remedies provisions. According to proponents of H.R. 2652, it has already been amended to effectively shield scientists, libraries, and researchers from monetary damages, i.e. such institutions and individuals would only be subject to injunctive relief. This is best seen by a review of the remedies provisions.

I. Civil Remedies Civil remedies are provided in § 1206 of the bill. Subsection 6(a) provides for federal court jurisdiction "without regard to the amount in controversy"; subsections 6(b), (c), and (d) empower the court to award, respectively, temporary and permanent injunctions; impoundment and "modification or destruction of copies"; and defendants profits, treble damages, and attorneys fees. Subsection 6(d) also provides that a where court determines that a database producer brought an action "in bad faith against a nonprofit educational, scientific, or research institution, library, or archives, or an employee or agent of such an entity, acting within the scope of his or her employment" the court shall award costs and attorney fees against the database producer. This is clearly intended as a disincentive to frivolous lawsuits against nonprofit entities. More importantly, subsection 6(e) provides:

"Reduction or Remission of Monetary Relief for Nonprofit Educational. Scientific, or Research Institutions. -- The court shall reduce or remit entirely monetary relief under subsection (d) in any case in which the defendant believed and had reasonable grounds for believing that his or her conduct was permissible under this chapter, if the defendant was an employee or agent of a nonprofit educational, scientific, or research institution, library, or archives acting within the scope of his or her employment."

We believe that it would be desirable to consider ways this exception from monetary liability could be clarified or strengthened. In particular, we think that the following changes should be considered: (a) The existing language in subsection 6(d) concerning costs and attorney's fees, including the provision for mandatory costs and fees against a plaintiff who sued a nonprofit entity in bad faith could be moved to a new subsection 6(f) (b) The remaining subsection 6(d) could be amended to make it clear, immediately, that the monetary damages described therein are "subject to the limitation described in subsection 6(e)"; and/or (c Subsection 6(e) could be amended to clarify that the burden of proof would fall on the plaintiff to establish that the defendant knew or had reasonable grounds to know that its actions were not permitted under the law; and/or (d) Subsection 6(e) could be amended to eliminate any initial awarding of damages. As presently written, subsection 6(e) intimates that a court would first award monetary relief (damages, profits, etc.) against a nonprofit defendant and then be required to "reduce or remit entirely" that monetary relief; and/or (e) Subsection 6(e) could be amended to require a court deny any monetary relief absent a showing the defendant knew or had reasonable grounds to know that its actions were not permitted under the law. Any of these changes, singularly or in combination, could make it easier for nonprofit institutions to establish the "ground rules" for when they might face monetary liability. [To the degree that clear ground rules can be established for researchers so that they know they will, at worst, be subject only to injunctive relief, we believe that this would substantially eliminate any "chilling effect" H.R. 2652 might have on non-profit educational and research activities.

ii. Criminal Remedies

H.R. 2652 includes criminal sanctions in § 1207 which provide for a fine up to $250,000 and up to five years imprisonment. Subsection 1207(a)(2) provides a very clear exception from any criminal liability for any "employee or agent or a nonprofit educational, scientific, or research institution, library, or archives acting within the scope of his or her employment." We believe that some criminal provisions are desirable to handle LaMacchia -like situations, i.e. in which judgment-proof individuals might seek to disseminate protected databases without any profit incentive. We also believe that the protection against criminal prosecution for nonprofit entities and individuals is adequately strong. The Department of Justice has informally recommended that § 1207 be amended to distinguish between "misdemeanor" and "felony" liability, with the latter available only for damage to a database producer exceeding $20,000. We understand that Justice is concerned that a statute establishing a relatively new form of liability should not have too low a threshold for criminal liability. We think that such a change would be appropriate, although it will only impact commercial and private entities and individuals -- not the nonprofit entities and individuals already exempted from the criminal provisions of the bill.

5. Consistent with U.S. trade policy, it is desirable to secure for U.S. companies the benefit of the EU Database Directive and laws in other countries protecting database products.

There was much discussion at the April conference of the effect of the EU Directive's "reciprocity" provision on American database producers. Unlike in a "national treatment" scheme, US companies do not automatically enjoy the protections afforded by the Directive's sui generis protection scheme. Presently, a database of a U.S. company is protected under the EU laws only if the U.S. company has a substantial economic presence in an EU Member State. A recent comparative study from Japan has concluded that "the existing disparity between US and EU database protection gives European database producers a distinct advantage" and that "[i]t may be argued that this reciprocity requirement enables European database producers to grow by exploiting US databases as long as the US . . . fails to provide an equivalent level of protection for European databases." An American firm that does not enjoy protection under the EU Directive faces several possible competitive disadvantages. First and most obviously, its noncopyrightable database may be duplicated and remarketed by others. Second, European data sources looking for a firm to "process" and market raw data will be more likely to enter into a contract with a European company that can guarantee protection of the database versus an American company that cannot. Thus, even if the American firm could effectively protect the database with technology and contract law, it may be at a disadvantage in obtaining "suppliers" of data. Could the U.S. force the EU to protect American databases in the absence of a U.S. database protection law? The U.S. has already cited the reciprocity provision of the Database Directive as one reason the EU was placed on the Priority Watch List in this year's Special 301 review process. Nonetheless, the U.S. has limited pressure it can bring to bear on the EU. We believe that the failure of the EU Directive to provide national treatment probably does not violate TRIPS. Because the Directive offers copyright protection to databases on virtually the verbatim terms required by TRIPS (Article 10(2)), the additional protection of the EU sui generis regime is probably not subject to the TRIPS national treatment requirement. This means that in order to protect all U.S. database producers, the U.S. would have to adopt domestic legislation that the European Commission would judge to be comparable to the EU Directive. A set of more abstract arguments is pitted against the general desirability of giving American firms the benefit of the EU Directive's reciprocity provision. First, there is the argument that given U.S. advocacy of national treatment, we should not condone the EU's use of reciprocity in their Database Directive because it will embolden both the EU and other countries to use reciprocity in other policy areas. The concern is that this would cause a breakdown of the national treatment doctrine under international law and "further balkanization of data availability conditions." We agree that there will be some superficial inconsistency between opposing the Directive's reciprocity approach and any U.S. adoption of a database protection regime that appears intended to meet the reciprocity requirement. But the U.S. often responds to the acts of other countries while disagreeing with those acts; the true inconsistency with our stated international policy would only be if a U.S. database protection law required reciprocity. The question remains whether H.R. 2652would be sufficiently comparable to the EU Directive. We believe that H.R. 2652 offers protection that is equivalent to the EU Directive and would give the United States a strong position to insist with the EU Commission that U.S. nationals enjoy the full benefits of the EU Directive: Like the EU Directive, H.R. 2652 protects investment, qualitative or quantitative, in a database [EU art. 7(1); HR § 1202]; Like the EU Directive, H.R. 2652 prohibits unauthorized takings of the whole or a substantial part of a database [EU art. 7(1); HR § 1202]; Like the EU Directive, H.R. 2652 permits insubstantial takings [EU art. 8(1); HR § 1203(a)], but prohibits unauthorized repeated takings of insubstantial part of the database [EU art. 7(5); HR § 1203(a)]; Like the EU Directive, H.R. 2652 applies separately from copyright [EU art. 7(4); HR § 1205(c)]; The EU Directive permits exceptions for "teaching or scientific research" [EU art. 9(b)] of the sort set out in H.R. 2652 [HR § 1203(d)]. Like the EU Directive, H.R. 2652 provides a fifteen year term of protection [EU art. 10; HR § 1208(c)]. Like the EU Directive, H.R. 2652 provides that it does not alter the effect of any other intellectual property laws [EU art. 13; HR § 1205(a)]. The principal differences between the two approaches include: While the EU Directive establishes a sui generis property right "located in the neighborhood of copyright," H.R. 2652 adopts a misappropriation approach that targets particular acts; The EU Directive appears to permit renewal of protection for an entire database when the database is revised [EU art. 10(3)] while H.R. 2652 permits a new term of protection only for the new elements of the revised database [HR § 1208(c)]; The EU Directive arguably has a narrower definition of a database than H.R. 2652; The EU Directive and H.R. 2652 take different approaches on the exemptions carved out of the protection regime. We believe, on the whole, that the comparable aspects of the two regimes far outweigh the differences. The case that H.R. 2652 provides comparable protection is strengthened by the fact that direct comparisons are not appropriate: the Directive provides guidance to the EU Member States for implementing legislation. Thus, each provision of H.R. 2652 that arguably diverges from the Directive should be compared to the parallel provision in each of the fifteen Member States' implementing laws. Only if all fifteen Member States adopted implementing legislation completely different from the H.R. 2652 provision would this be a grounds that the two are not "comparable" in that respect.

B. OTHER ISSUES

1. Databases Prepared for Scientific Markets

We believe that there remains at least one place where the interests of database producers and scientists/educators may be in a "zero sum" conflict: how to handle collections of information specifically prepared and marketed to scientists and educators. The problem is apparent in the § 1203(d) exception that shield "extracting or using information for nonprofit educational, scientific, or research purposes" as long as such activity "does not harm the actual or potential market for the product or service referred to in section 1202." Many people have pointed out that this does not exempt from liability extraction/use from databases marketed to the nonprofit scientific or research communities. This is a place where the desire to provide proper incentives for the production of databases runs squarely into the desire to provide as much as access to information as possible to researchers and educators. If a commercial firm creates a database intent on educators/researchers being a substantial part of the market for that database, then consistent application of the incentive rationale requires that the firm have the same protection against educators/researchers that it would have against others in the marketplace. This is also consistent with Congress' recognition that a number of types of copyrighted works -- such as informational newsletters targeted to particular audiences, textbooks, testing materials, and other materials prepared for the school market may not enjoy as wide a range of fair use as other types of materials.

2. "Sole-Source" Database Issues

Both prior to and during the conference, the debate over database protection has frequently turned to the issue of "sole source" databases. Critics of database protection proposals have often advocated that databases which are the only source for certain types of information should be treated differently from other databases. The argument is that otherwise, any "sole source" database protection scheme would create a monopoly over access to the facts in these sole-source databases. A frequently heard proposal is that such sole source databases should be subject to some type of mandatory licensing system. There is an initial problem in defining what is meant by a "sole source" database. Is it an absolute sole source for the data? Or is it a practical sole source for the data? We believe that there is a tremendous difference between the two and that critics of database protection frequently use the former extreme cases to advocate mandatory licensing or similar restrictions on a broader range of compilations. Examples of an absolute sole source database would be, for example, (a) measurements of solar flares during a specific period that were done at only one telescope, (b) temperature and air content measurements made inside a cave by the initial spelunkers who discovered it and opened it to the surface, (c) historic climatological measurements for the specific location that were made by only one party. In fact, scientific measurements are among the most likely candidates to be absolutely unique datasets. There are also many unique sources of historic data, i.e. the Mormon Church's genealogical records might qualify. If it is correct that these are the vast majority of true sole-source databases, then access to information in sole-source databases may not be a significant issues in any database protection regime which (a) does not apply to government-funded data and (b) which has a reasonably defined sunset on database protection rights. Critics of database protection have, however, broadened their view of "sole-source" databases to include those where, while the raw information still exists in the world and could be collected independently, the information has been collected and commercialized by only one party. The argument is that the information is, for practical purposes, under the control of a single entity and because there is no competition the database owner will extract monopolist rents from users. The problem with this argument is that it cuts too wide. There will inevitably be many small markets that can only be viably served by one firm; we should expect that the number of such niche markets will only increase with time. Instituting a mandatory licensing system would, in effect, penalize those who are "first to market" in serving these niche demands. It is undesirable to create an IP regime that dissuades firms from entering such small markets. Our country takes, for example, the opposite approach with the "orphan drug law" -- which is intended to give firms an incentive to fill and stay in niche markets for which R&D costs cannot be easily recovered. Similarly, in the copyright field, there has been recognition that fair use should be drawn more narrowly when the producer of the work is supplying a small market. H.R. 2652 offer a limited response to possible sole source monopolist pricing by expressly providing in section 1205(d) that nothing in the statute effects "Federal and State antitrust laws, including those regarding single suppliers of products and services." This raises a minor concern: under patent and copyright law, courts have developed "misuse" doctrines independent of antitrust law. Does the express mention of antitrust law in H.R. 2652 preclude a "database protection misuse" doctrine? We think the answer is unsettled, albeit probably 'no.' To clarify this possible ambiguity, we suggest that § 1205(d) be written in a way as to ensure that courts remain free to develop any equitable doctrines doctrine that would be appropriate in this area. We think that this would be the easiest way to unambiguously preserving the possible use of doctrines like unclean hands or "misuse" against database producers.

If such language were not adopted in the act, we would recommend that the legislative history make clear that express consideration of the antitrust laws in the statute does not prevent the courts from denying relief to a database producer on equitable grounds and the possible development of a "database protection misuse" doctrine.

3. Distinguishing Protected from Unprotected Material: the issues of "perpetual" protection and value-added compilations of government-generated data One of the places where a neutral observer might wonder if the sides are speaking about the same issue is the question of the duration of protection. Critics of database protection frequently claim that a regime of "perpetual" protection would be created or that proposals call for protection greater than copyright protection --- yet the current legislative proposal calls for a 15 year duration (and copyright endures for the life of the author plus 50 years). For reasons we will explore below, this problem has certain contours in common with the issue of privately-held, sole source databases from government-generated data. The critics' concern about "perpetual protection" is rooted in the need to provide some type of protection for revisions of databases. If legislation were passed that provided protection to new databases, but did not provide protection to revision of databases, this would skew investment. There would be a disincentive to revise proven, useful databases in favor of creating new databases. Reassembling (largely) the same information in a new database would be inefficient not only for data gatherers, but for data users who -- in order to use the most current data -- would have to accustom themselves to the format of the new database. The drafters of H.R. 2652 believe they resolve this problem with the general definition of what is protected and the 15 year statute of limitations: "[N]o action can be maintained more than fifteen years after the investment of resources that qualified that portion of the collection of information that is extracted or used. This language means that new investments in an existing collection, if they are substantial enough to be worthy of protection, will themselves be able to be protected, ensuring that producers have the incentive to make such investment in expanding and refreshing their collections. At the same time, however, protection cannot be perpetual; the substantial investment that is protected under the Act cannot be protected for more than fifteen years. By focusing on that investment that made the particular portion of the collection that has been extracted or under eligible for protection, the provision avoids providing on-going protection to the entire collection every time there is an additional substantial investment in its scope or maintenance." (Legislative Report) We believe that this does not wholly address the concerns of those who believe that the bill could create "perpetual protection." While the bill provides no de jure perpetual protection, many users believe that the digital environment might be manipulated in some situations to produce de facto perpetual protection. This potential problem is limited to a discrete set of databases. Some databases are revised extensively and constantly; for these databases, the value of the database is much shorter than 10 or 15 years. Stock exchange price listings are the most extreme example, but other lists -- realtors' sale listings and used car valuations also fall in this category. Other databases will be revised rarely, if ever, once a definitive version is completed, i.e. a database of Union warships in the Civil War or the passengers on the Mayflower . The databases for which the "perpetual protection" problem arises are ones that have value over many years and require substantial, but not total, revision. An example would be a historical database of the batting statistics of all baseball players in the major leagues or a database of medical compounds. Our understanding of the "perpetual protection" problem with these databases is as follows. In the classic case of a copyrighted book, the text loses protection at the end of its term, although new, revised versions of the text may enjoy fresh periods of protection. This means that one can find unprotected texts of Antigone or Pride and Prejudice in libraries all over the country. At the same time, new versions of these books can be under some copyright protection (including new introductions, translations, "notes," artwork, etc.) It is possible to compare the two versions -- old, unprotected and new, protected -- side-by-side. In the digital, on-line environment, content producers may chose not to alienate copies of their works; instead access to a database may be licensed to users. The advantage is that the database user can receive the most current version of the compilation. The disadvantage is that the user may lack access to any old version of the database in which to compare old and new entries.

Imagine that in 2000, a database producer makes a database; we will designate the first twelve entries alphabetically: A

B

C

D

E

F

G

H

I

J

K

L

In 2003, it "expands and refreshes" the database, so that the first fifteen entries are as follows:

A

B

BB

C

D

E

F

FF

G

H

I

J

K

KK

L

In theory, under H.R. 2652 in the year 2016, all of the entries except BB, FF, and KK lose protection -- and can be copied in their entirity. The problem is that if the database is provided via on-line services, there may be no means for the user to know which entries are unprotected because they were original entries and which entries are protected because they are the result of maintenance investment within the past 15 years. Critics of database protection are correct to point out that this could produce "chilling" effects on those who want to use the database after the initial term of protection. One commentator has suggested that new entries by electronically "tagged," so that a user can readily determine what is protected and what is not, i.e.

A B

BB

C D E F

FF

G H I J K

KK

L To the extent they have considered this idea, the protection advocates have not been favorable to the "tagging" idea. We too recognize that it might create substantial technological problems or costs, depending on the database. Another possible solution would be to require any database producer that wanted to enjoy protection for a revision of their database after the fifteen year period to make (or have made) the original, no longer-protected database available in a reasonable format. This would be the electronic equivalent of the old copy of Wuthering Heights in the public library. The original database need not be as available as the new version -- just as old library books usually are not as available as books at retail stores, but it should reach some standard of public access. On this count, it is possible that the problem of "perpetual protection" could be addressed by establishing a limited, well-defined archiving right for libraries, possibly taking ideas from 17 U.S.C. §108 and §403 of the Digital Millennium Copyright Act, which modifies 17 U.S.C. §108 to cover digitized archiving. This archiving approach does not, however, resolve the similar problem that could arise when (a) a private entity adds value to government-generated information, (b) distributes the new, value-added compilation, and (c) the government withdraws from supplying the data to the public. In such situations, there is the possibility that the private entity will use a minimal amount of value-added processing to claim that the entire compilation of information is protected. This could frustrate the goal of making government-generated data widely available; at the same time, we do not want to adopt any regime which will take away from incentives to "value-add" to government-generated data. Copyright law has addressed a parallel problem: mixtures of privately-generated (copyrightable) materials with government-created (noncopyrightable) materials. In such cases, 17 U.S.C. § 403 provides that where a work is "predominantly" U.S. Government material, the copyright notice should include a "statement identifying, either affirmatively or negatively, those portions" protected under the copyright law as contrasted with the "works of the United States Government". If a copyright holder fails to include such a statement, 17 U.S.C. § 403 provides that the defendant in an infringement action can claim a defense based on innocent infringement to mitigate any damages. We think that it would be appropriate to consider whether a similar provision, possibly linked to "tagging" or otherwise identifying government-generated data, should be included in H.R. 2652.