Everyone's Doing It, But Is It Legal?

Watch out. Are your big data feeds in blatant violation of terms of service agreements?

The Big Data movement taking hold in Financial Services is based in a large part on the aggregation of data from different sources, often using web scraping methods like a bot or spider. And while it's easy to see the web as an open source of information for the taking, it's worth reminding that it's really not. So before finding yourself at the wrong end of a cease and desist letter or court order, take a good look at the terms of service.

Web scraping is when a company uses automated programing to retrieve information or content made public by another company. If the scraping company makes competitive use of the other firm's information it could potentially be copyright infringement or a breach of contract (terms of use), among other potential violations.

"You may be surprised how few websites have prohibitions in place from web scraping bots," said Dale Cendali, ESQ from Kirkland & Ellis LLP. However, a site can always change its terms of use, and recent court cases against crawlers show there can be real impacts to violators.

Some Pointers

Are you scraping data from other sites? Interested in scraping? Or worried about being scraped? Cendali and Anthony Dreyer, Esq at Skadden, Arps, Slate, Meagher & Flom LLP, offer up this bit of advice:

If you are interested in scraping:
- Be aware of terms of use for sites you may want to crawl
- Consider what information you need to crawl, and how you intend to use it (is it copyrighted?)
- Consider how often you need to crawl (repeated crawling can weigh on a site's servers, potentially triggering liability)
- Consider what others in the same industry are doing
- Respect Robots.txt files (Robots.txt is a text file that website owners can put in web site hierarchy to instructs automated software not to crawl the site)

If you are concerned about being scraped:
- Draft terms of use to prevent scraping
- Prominently post terms of use and/or have people click to accept terms of use
- Use Robots.txt

3 Kinds of Contract Agreements for Websites

Perhaps the best defense, for the scraper or scrapee is the Terms of Service. If it's not explicitly stated that scraping is against the site's user agreement then the scraper may have a better legal ground to stand on in court. Naturally, an iron clad terms of service (coupled with cease and desist orders) helps protect the site being scraped.

Of course, things aren't always that simple. According to Dreyer court cases in this arena have demonstrated that the display of contract agreement is also an important element in a defense. There are three kinds of online agreements for websites:

- Click-wrap: These require users to consent to terms and conditions by clicking that pesky "I Agree" or "I Accept" button before the user can proceed to use a website. These are generally considered enforceable, due to the clear actionable assent. Although courts acknowledge users don't really read the terms of agreement they do so at their own risk.
- Browse-wrap: This is the posting of a link to the terms and conditions on a websites for users to click on if interested, but is not required to use the site. It is usually found at the very bottom of a webpage on a toolbar. In this case user consent is implied by continued use of the site. However, the visibility and accessibility of the link plays an extremely significant roll in court.
- Contract implied by conduct: Less common, this is when terms of use are presented after first accessing information on a website. On subsequent visits it is understood the user is on notice. Consider it a one free pass situation.

Cendali adds that while marketers are often at battle with legal over the size and prominence of terms of service, a company's best defense is to make sure all the terms of service are prominent. Use all the terms like scraping, crawling, spidering, data harvesting etc and don't feel bad about bothering viewers with bigger notices.

Case in point, after a few interesting court cases of its own over data scraping, Ticketmaster now has what can be considered a very clear, very bold and all capitalized browse-wrap link at bottom of the webpage. "People may not like it on their site but you find creative ways to show it," offers Cendali.

In the world of financial services where data is being pulled beyond the US borders it would be wise to also consider international laws around Terms of Use and copyright infringement.
Becca Lipman is Senior Editor for Wall Street & Technology. She writes in-depth news articles with a focus on big data and compliance in the capital markets. She regularly meets with information technology leaders and innovators and writes about cloud computing, datacenters, ... View Full Bio

Those pesky "terms and conditions" that require one to click "I agree, or I disagree," seem to be the most direct method of informing the user about copyright infringement and the no scraping, crawling policy. As you say, that one is enforceable in court. I think many users click the documents as an annoyance, but a business that intends to grab data should think twice.

I am absolutely not a lawyer, so this is not legal advice, but it seemed in all the case they used as examples, providing the offending scrapers with some kind of documented warning (and proof they ignored that warning) helped their cause. And I believe they said something about warning them in advance of any court actions was important. Again, not a lawyer. Good luck! As Nathan says, who knew this was such a large battleground?

This was an interesting session Gă÷ who knew there was such a "cold war" around scraping each others' sites for information even today in 2014? Financial firms Gă÷ if they're not doing it themselves, of course Gă÷ would be wise to be careful what exactly they put on their public sites for competitors to glean.

What if the one being scraped is a startup which has taken all the precautionary measures such as tagging a robots.txt file and an iron clad terms of service, but does not have the means to hire a laywer for any cease-and-desist actions?