THE STATE OF THE LAW ON DATA SCRAPING

LinkedIn pushed back – for now

Data is king on the Internet. Of course, real commerce does occur via the Internet in huge and increasing volumes and SaaS offerings are proliferating, but no matter what activities are occurring, valuable user data is always being collected, aggregated, analyzed and often monetized. So much so that you can be certain that wherever you find a useful free online service, you will also find immense data collection and monetization. So, in effect, in such cases, you and your data become the product.

It should not be surprising that when one website massively and successfully gathers valuable user data, you will find others that want to access and use that data. The primary means of gaining such access are either licensing or scraping. Licensing is with permission and usually comes with a fee. Scraping is usually without permission, is legally risky and is free.

So, what is the state of the law on data scraping?

In August, the United States District Court for the Northern District of California, in hiQ Labs, Inc. v. LinkedIn Corporation, may have broken new ground enhancing the rights of data scrapers (or maybe not), but first let’s examine the legal issues involved.

Potential legal violations of data scraping

In order to evaluate the risk versus benefit of a data scraping business model, it is necessary to understand the potential legal violations that might occur.

Computer Fraud and Abuse Act (CFAA). The CFAA is a federal statute that imposes liability on someone who “intentionally accesses a computer without authorization or exceeds authorized access, and thereby obtains…information from any protected computer.” A determination of liability under the CFAA will usually focus on whether the data scraper has actual or constructive knowledge that the terms governing access to the website prohibit the data scraping activity. Establishing knowledge of the user often depends upon how explicitly the website terms of service prohibit data scraping and whether the user is legally bound by the terms of service (i.e., was it a clickwrap or browsewrap acceptance?) and if not accepted whether the applicable terms of service were conspicuous enough to amount to constructive knowledge. Violations of the CFAA can be punished by fines and prison.

Breach of Contract. If a user is bound by terms of service that clearly prohibit data scraping, and a user violates such terms, then the user is in breach of the terms of service. Such a breach can be the basis for prohibiting the user from continuing to access and scrape data. Whether or not such a breach of contract would result in liability to the user will depend upon whether the website can establish that it incurred damages as a result of the breach.

Copyright Infringement. The data scraping process by definition involves removal of content from a website. If the content is protected by copyright, and the terms of service do not allow such copying, then the data scraper would be guilty of copyright infringement. Copyright infringement claims can result in high damages under the U.S. Copyright Act, including statutory damages of up to $150,000 per violation for intentional infringement, payment of legal fees, and granting of an injunction.

The complication is that not all data on a website is copyrightable, or if copyrightable, it is not necessarily owned by the website – therefore the website would not be able to institute an infringement suit. For instance, many data heavy websites are composed of user-generated content (like LinkedIn, Facebook, YouTube, Instagram, Twitter, etc.) which in almost all cases is owned by the user and not the website. So, the website cannot usually sue for copyright infringement. Additionally, much content that is scraped is mere data and data basically equates to ideas, and mere ideas are not protectible under U.S. copyright law. Lastly, even if the content is copyrightable and owned by the website, the scraping might qualify under the fair use exception, which allows use of a copyrightable work if the use transforms the original work in a completely new or unexpected way. See Campbell v. Acuff-Rose Music, 510 U.S. 569 (1994.)

Trespass to Chattels. Okay, this one sounds a bit archaic, but it actually can apply to a website. Trespass to chattels (which is property other than real estate) is a tort occurring when one party intentionally interferes with another person's lawful possession of a chattel. In this case, the website owner has an enforceable property right in the servers hosting the website, so unauthorized access could constitute this tort. Courts have found such a trespass mostly where the scaping puts a burden on the website operation.

Recent LinkedIn case may open new possibilities (or not)

Now back to hiQ Labs, Inc. v. LinkedIn Corporation. hiQ was scraping data from public LinkedIn profiles and running an algorithm to determine the likelihood that specific employees may stay with the their current employer or may be seeking other employment. This data was sold to the employers for HR planning purposes. The court granted a preliminary injunction against LinkedIn to prevent implementing technology that would block the data scraping activities of hiQ.

Although hiQ had been performing this scraping for several years with LinkedIn’s knowledge, LinkedIn had a change of heart and issued a cease and desist letter, threatening legal action and blocking hiQ’s access. hiQ responded by seeking an injunction. LinkedIn claimed that continued scaping was a breach of contract (i.e., it violated the LinkedIn User Agreement), and a violation of the CFAA among other laws. The court did not address the breach of contract issue. However, regarding the CFAA claim, the court importantly stated that: “A user does not “access” a computer “without authorization” by using bots, even in the face of technical countermeasures, when the data it accessed is otherwise open to the public.” And herein lies the crux of the decision: The data that hiQ was accessing was not behind a username/password procedure.

This case is very fact specific. The court did not find generally that the CFAA does not apply to data scraping. In this case, (1) hiQ did not have to login to LinkedIn in order to scrape the data because the profiles scraped were publicly available (so terms of service might not apply), (2) the profiles did not contain copyrightable materials owned by LinkedIn (it was user-generated content), and (3) LinkedIn had previously for a number of years knowingly permitted hiQ’s activities and now blocking hiQ might drive hiQ out of business.

In this case, if the data could only be scraped after logging in, then there is a likelihood that the court would have found a CFAA violation. The court also held that that the application of technical blocking measures would not result in a conclusion that a user implementing countermeasures to continue access would constitute unauthorized access under the CFAA.

An additional important factor in the court’s granting a preliminary injunction was that LinkedIn was the sole source of data for hiQ and allowing LinkedIn to block hiQ’s access would put hiQ out of business and result in irreparable harm. It is important to note that this is a preliminary injunction, and LinkedIn is likely to appeal.

So, after hiQ Labs, Inc. v. LinkedIn Corporation, where are we?

The legality of data scraping will require examining (1) whether the terms of service prohibit data scraping (and whether the terms of service are binding on users), (2) whether a password is required to access the website data, (3) whether copyrightable content is being scraped (and whether the fair use exception applies), and (4) whether the data scraping places an unreasonable burden on the website.

If you are a data scraper, then based upon hiQ, not much has changed unless the data is publicly accessible without a password.

If you are a website seeking to prohibit data scraping, hiQ is a lesson that your terms of service must clearly prohibit data scraping and access to the data should be password protected.

Bottom Line

Data scrapers need to proceed with legal advice and caution. Even though the law may be moving slightly in the data scraper’s favor, there are still substantial grounds for websites to issue cease and desist letters and threaten legal action. hiQ was fighting for its life and was able to afford to hire the formidable (and expensive) Laurence Tribe, Harvard constitutional law professor, as defense counsel. So, you need to consider whether you have a legal war chest available.

Websites should carefully review their terms of service anti-data scraping provisions, and the acceptance process for the terms of service, and consider putting valuable data behind a login process. Of course, all data scraping technological prevention methods should be considered and implemented.

If you want to discuss this post or any other legal issue with the author, contact him using the contact form below or via inquiries@galkinlaw.com or by calling (410) 484-2500. We'd like to hear from you!

Share This Story, Choose Your Platform!

William Galkin manages GalkinLaw. Mr. Galkin has dedicated his legal practice to representing Internet, e-commerce, computer technology and new media businesses across the U.S. and around the world. He serves as a trusted adviser to both startup and multinational corporations on their core commercial transactions.