The Black, White And Gray of Web Scraping

06 Jun 2014

There are many reasons for wanting to scrape data or content from a public website. I think these reasons can be easily represented as different shades of gray, the darker the grey being considered less legal, and the lighter the grey more legal you could consider it. You with me?

An example of darker grey would be scraping classified ad listings from craigslist for use on your own site. Where an example of lighter grey could be pulling a listing of veterans hospitals from the Department of Veterans Affairs website for use in a mobile app that supports veterans. One is corporate owned data, and the other is public data. The motives for wanting either set of data would potentially be radically different, and the restrictions on each set of data would be different as well.

Many opponents of scraping don't see the shades of grey, they just see people taking data and content that isn't theirs. Proponents of scraping will have an array of opinions ranging from, if it is on the web, it should be available to everyone, to people who only would scrape openly licensed or public data, and stay away from anything proprietary.

Scraping of data is never a black and white issue. I’m not blindly supporting scraping in any situation, but I'm a proponent of sensible approaches to harvesting of valuable information, development of open source tools, as well as services that assist users in scraping.