Techdirt. Stories filed under "scraping"Easily digestible tech news...https://www.techdirt.com/
en-usTechdirt. Stories filed under "scraping"https://ii.techdirt.com/s/t/i/td-88x31.gifhttps://www.techdirt.com/Tue, 16 Aug 2016 08:30:00 PDTDisappointing: LinkedIn Abusing CFAA & DMCA To Sue Scraping BotsMike Masnickhttps://www.techdirt.com/articles/20160812/17204235230/disappointing-linkedin-abusing-cfaa-dmca-to-sue-scraping-bots.shtml
https://www.techdirt.com/articles/20160812/17204235230/disappointing-linkedin-abusing-cfaa-dmca-to-sue-scraping-bots.shtmlabsolutely should know better, look to abuse the CFAA to attack people using tools to scrape public information off of their websites. In the past few years, we've seen Facebook and Craigslist do this (with Facebook recently winning in court).

The latest lawsuit appears to be more of the same, claiming that the scraping violates both the CFAA and the DMCA:

During periods of time since December 2015, and to this day, unknown persons
and/or entities employing various automated software programs (often referred to as “bots”) have
extracted and copied data from many LinkedIn pages. To access this information on LinkedIn’s
site, the Doe Defendants circumvented several technical barriers employed by LinkedIn that
prevent mass automated scraping, and have knowingly and intentionally violated various access
and use restrictions in LinkedIn’s User Agreement, which they agreed to abide by in registering
LinkedIn member accounts. In so doing, they have violated an array of federal and state laws,
including the Computer Fraud and Abuse Act, 18 U.S.C. §§ 1030, et seq. (the “CFAA”),
California Penal Code §§ 502 et seq., and the Digital Millennium Copyright Act, 17 U.S.C. §§
1201 et seq. (the “DMCA”), and have engaged in unlawful acts of breach of contract,
misappropriation, and trespass.

This is bullshit. Courts have directly held that violating a terms of service does not equate to a CFAA violation for "unauthorized access" or "exceeding authorized access." Here, it appears that LinkedIn is hoping that the combination of claiming a terms of service violation with attempts to get around technological protection measures makes it a CFAA violation.

I completely understand that LinkedIn may not like the fact that people are scraping its data, and that they've found ways around LinkedIn's attempts to block such scraping via technological means, but it's a dangerous slippery slope when a company is claiming that a terms of service violation violated the CFAA -- and that getting around simple blocks becomes a DMCA 1201 anti-circumvention violation. Both of these are problematic: saying that violating the terms of service violates the CFAA is a stretch and saying that violating the DMCA by getting around protection technology -- even if not for the purpose of infringing on copyrights -- is a problem.

Of course, this lawsuit, like the last one, is probably really designed to just sniff out who's running the bots, and to push them into a settlement where they'll stop doing so.

Still, this lawsuit seems particularly ridiculous coming just weeks after LinkedIn's founder and chairman, Reid Hoffman, funded a $250,000 disobedience award at MIT's Media Lab. The point of that award is to encourage people to engage in disobedience to change society in a positive way -- which is something that people often use scraping for. And yet, here his company is engaging in a legal battle that will make that kind of scraping much more risky. I know and like Hoffman, who is quite a smart, thoughtful and principled guy. And I have no idea if he even knew this lawsuit was going to be filed. But I think it sends the wrong message when he's encouraging useful hacking on the one hand, while his company (which, yes, was just sold to Microsoft) is suing people for doing the very same thing of hacking on the other hand.

Permalink | Comments | Email This Story
]]>please-don't-do-thishttps://www.techdirt.com/comment_rss.php?sid=20160812/17204235230Wed, 1 Jul 2015 14:33:00 PDTNo Craig Newmark Did Not Donate To EFF; He Helped Make CFAA Worse InsteadMike Masnickhttps://www.techdirt.com/articles/20150701/14150431519/no-craig-newmark-did-not-donate-to-eff-he-helped-make-cfaa-worse-instead.shtml
https://www.techdirt.com/articles/20150701/14150431519/no-craig-newmark-did-not-donate-to-eff-he-helped-make-cfaa-worse-instead.shtmldonating $1 million to EFF when the money is not actually from Craig. It's from a startup that Craigslist has sued out of business, under a dangerous interpretation of the CFAA that harms the open internet. Obviously, EFF getting an additional $1 million in resources is really great. But it's troubling to see so many people congratulate Craigslist and Craig Newmark for "supporting EFF." Craig himself has contributed to this misleading perception with this tweet implying he's giving his own money to EFF:

Plenty of smart people are cheering on Craig for supposedly being so generous. But that's wrong. This isn't Craigslist being generous. This is Craigslist abusing the CFAA to kill a company who was making the internet better, and then handing over some of the proceeds to the EFF, which actively opposed Craigslist's lawsuit.

Now, I should note upfront that I like Craigslist and very much like Craig Newmark personally. I think that the company has been really innovative in taking a more long term view of its business (even if it's been losing ground more recently). However, this lawsuit was always really sketchy. It sued a few companies for making Craigslist more valuable. Those companies were scraping Craigslist data, but only to overlay additional information and always pointing people back to Craigslist. In other words, the companies Padmapper and 3taps were adding value to Craigslist in the same manner that much of the internet was built -- by providing more value on top of the work of others.

And yet Craigslist sued these companies under a tortured definition of the CFAA, arguing that the mere scraping of its data to provide value on top of it (none of which took away any value from Craigslist) was "unauthorized access." The EFF filed an amicus brief against Craigslist, slamming the company (which it has frequently supported in other circumstances) for abusing the law:

The CFAA does not and should not impose liability on anyone who accesses information publicly available on the Internet. Because the CFAA and Penal Code § 502 imposes both civil and criminal liability, it must be interpreted narrowly. That means information on a publicly accessible website can be accessed by anyone on the Internet without running afoul of criminal computer hacking laws. In the absence of access, as opposed to use, restrictions, Craigslist cannot use these anti-hacking laws to complain when the information it voluntarily broadcasts to the world is accessed, even if it is upset about a competing or complementary business.

[....]

Craigslist’s enormous success is a result of its openness: anyone anywhere can access any of its websites and obtain information about apartments for rent, new jobs or cars for sale. Its openness means that Craigslist is the go to place on the web for classified ads; it users post on Craigslist because they know their ads will reach the largest audience.

But what Craigslist is trying to do here is to use the CFAA’s provisions to enforce the unilateral determinations it has made concerning access to its website, an Internet site that it has already chosen to open up to the general public, attempting to turn a law against computer hacking into a new tool. But prohibiting access to an otherwise publicly available website is not the type of harm that Congress intended to be proscribed in the CFAA, and nowhere in the legislative history is there any suggestion that the CFAA was drafted to grant website owners such unbridled discretion.

That's the EFF directly arguing against Craigslist in this case. Unfortunately, the initial district court ruling agreed with Craigslist, leading EFF to note just how dangerous the ruling was:

There's a serious potential for mischief that is encouraged by this decision, as companies could arbitrarily decide whose authorization to "revoke" and need only write a letter and block an IP address to invoke the power of a felony criminal statute in what is, at best, a civil business dispute.

Judge Breyer’s opinion appears to mix up two different aspects of the CFAA. The first aspect is the prohibition on unauthorized access, and the second is its associated mental state element of intent. The CFAA only prohibits intentional unauthorized access; merely knowingly or recklessly accessing without authorization is not prohibited. So whatever unauthorized access means, the person must be guilty of doing that thing (the act of unauthorized access) intentionally to trigger the statute. Breyer seems to mix up those elements by focusing heavily on the fact that 3taps knew that Craigslist didn’t want 3taps to access its site. According to Judge Breyer, the clear notice meant that the case before him didn’t raise all the notice and vagueness issues that prompted the Ninth Circuit’s decision in Nosal.

So now the case has been settled, and, as a result, at least one of the companies involved, 3taps, is shutting down altogether. 3taps points out that it's 3taps, not Craigslist whose money is going to EFF:

As part of the settlement, 3taps and its founder, Greg Kidd, have agreed to pay craigslist $1 million, all of which must then be paid by craigslist to the EFF, which supported 3taps' position on the CFAA in this litigation, and continues to do great work for Internet freedom generally. Mr. Kidd's investment firm, Hard Yaka, has also committed to make a substantial investment in PadMapper to provide it with the resources to continue to innovate and serve the post-craigslist marketplace.

Although 3taps lacks the resources to continue the fight, this settlement provides much needed resources to the EFF, as there is still much to be done on the issues raised in this case.

For example, the question remains whether private companies that maintain public websites can selectively exclude visitors, exposing the banned visitor to civil and criminal liability under the CFAA.

Furthermore, this is unlikely to be the last litigation involving craigslist's copyrights, particularly given craigslist's current practice of selectively obtaining copyright assignments and registrations (the prerequisite to a copyright infringement lawsuit) in certain user-generated posts, but failing to inform its visitors which posts it owns. This effectively creates a copyright litigation trap for unwary visitors.

Finally, it remains unresolved whether craigslist's well-recognized practice of "ghosting" (the hiding or interception of user postings and emails) without the users' knowledge or consent is legal or ethical.

Given all that, it's fairly disappointing to see lots of prominent people backslapping Craig and Craigslist for "donating" this money to EFF. It's not Craig's money. And, according to the settlements, it appears that the $1 million isn't all that Craigslist is getting. That's just the money 3taps is paying. Another company in the dispute, Lovely, is paying an additional $2.1 million. It's unclear if Craigslist is giving that money to EFF or anyone else -- or keeping it.

Again, on most issues, I think Craig and Craigslist are on the right side of things. He fought strongly against SOPA and for net neutrality. I think the company does the right thing in many cases, but in this case it clearly did not, and the fact that people are now cheering him on when it's not even his money, and is only happening as a result of his bad lawsuit that forced another company to shut down, is really disturbing.

Permalink | Comments | Email This Story
]]>eff opposed the lawsuithttps://www.techdirt.com/comment_rss.php?sid=20150701/14150431519Mon, 10 Feb 2014 05:40:00 PSTNY Times 'Uses' Scare 'Quotes' To Highlight How 'They' Don't 'Understand' How Snowden 'Copied' DocumentsMike Masnickhttps://www.techdirt.com/articles/20140208/23402526145/new-york-times-uses-scare-quotes-to-highlight-how-they-dont-understand-how-snowden-copied-documents.shtml
https://www.techdirt.com/articles/20140208/23402526145/new-york-times-uses-scare-quotes-to-highlight-how-they-dont-understand-how-snowden-copied-documents.shtmlgot access to various documents, in which they use bizarre scare quotes around perfectly ordinary words, more or less emphasizing what the reporters clearly don't understand:

Intelligence officials investigating how Edward J. Snowden gained access to a huge trove of the country’s most highly classified documents say they have determined that he used inexpensive and widely available software to “scrape” the National Security Agency’s networks, and kept at it even after he was briefly challenged by agency officials.

Using “web crawler” software designed to search, index and back up a website, Mr. Snowden “scraped data out of our systems” while he went about his day job, according to a senior intelligence official.

Lots of people who read this started quickly mocking it online. Matt Blaze joked about the fact that children (children!) might download wget:

@kataclyst@csoghoian I understand wget is available on the Internet where children can download it. Children!

But, perhaps the most entertaining mocking came from Marc Andreessen who rattled off a series of similar sounding lines with scare quotes that might highlight how silly the NYT report sounds to anyone even marginally familiar with technology.

The NY Times has a bunch of the Snowden documents. It might be a good idea for them to reach out to someone who actually understands technology before reporting on any more of them.

Permalink | Comments | Email This Story
]]>wtf?https://www.techdirt.com/comment_rss.php?sid=20140208/23402526145Fri, 21 Jan 2011 01:06:00 PSTDating Site's Plans To Create Profiles By Scraping Social Networks: Publicity Stunt Or Just Dumb?Mike Masnickhttps://www.techdirt.com/articles/20110119/04141612721/dating-sites-plans-to-create-profiles-scraping-social-networks-publicity-stunt-just-dumb.shtml
https://www.techdirt.com/articles/20110119/04141612721/dating-sites-plans-to-create-profiles-scraping-social-networks-publicity-stunt-just-dumb.shtmldead or fake profiles, just imagine the legal issues facing an Australian dating site that claims it's going to scrape social networking profiles and turn them into dating site profiles. I'm not even going to mention the name of the company, because I'm pretty sure this was just a publicity stunt to get its name in the press, before "backing away" from the plan. If it's an actual plan, it's stupid. Not just because of the potential privacy concerns and lawsuits, and not just because some of the social networks from which they scrape the info may find ways to sue them as well, but because this seems like a terrible strategy for a dating site. I mean, if you're looking to find a dating site where you're likely to actually meet someone, are you going to use the site where the vast majority of the "members" don't even know they're members? It's hard to see how that makes for a compelling pitch. And I'm not even getting into what will happen when it starts creating profiles of people who are married or in long term relationships...

Permalink | Comments | Email This Story
]]>lawsuit-waiting-to-happenhttps://www.techdirt.com/comment_rss.php?sid=20110119/04141612721Wed, 9 Sep 2009 06:41:00 PDTWhy Doesn't Century 21 Canada Want More People Viewing Its Real Estate Listings?Mike Masnickhttps://www.techdirt.com/articles/20090908/1251356129.shtml
https://www.techdirt.com/articles/20090908/1251356129.shtmlsuing telco Rogers and its subsidiary Zoocasa for creating what appears to be a real estate info portal/search engine. At issue: Zoocasa apparently scrapes various real estate listings, including those from Century 21 Canada, to provide them in its own search results, along with some additional info -- but still links back to the original Century 21 listing. In other words, it acts like a basic search engine. It's difficult to see how or why that should be against the law.

Of course, the real estate business has always been focused on bogus exclusions on data though the MLS system -- and apparently they don't like the idea of that data being more widely available. But, still, it's difficult to see what right Century 21 has to complain about, since the site links to Century 21 postings and should only provide them with more traffic. Unless, of course, its fear is that it can't compete by offering enough useful info on its own site.

Permalink | Comments | Email This Story
]]>someone-please-explainhttps://www.techdirt.com/comment_rss.php?sid=20090908/1251356129Fri, 10 Jul 2009 15:20:00 PDTPower.com Says Facebook Can't Block Access To User DataMike Masnickhttps://www.techdirt.com/articles/20090710/0222325507.shtml
https://www.techdirt.com/articles/20090710/0222325507.shtmlFacebook's reasoning for suing Power.com, a site that tried to aggregate a variety of social network sites into a single interface (something that seems rather useful). However, Facebook insisted that it violated its copyright, and in a slightly troubling ruling in the case, the judge seemed to find that any scraping could be copyright infringement, even if the scraping was just to get at non-infringing content. The court's argument was that in order to get at the non-infringing content, you first have to scrape the infringing content too.

Now the case is getting odder, as Power.com has countersued Facebook, claiming that Facebook is "unlawfully withholding the data that users own (as stated in Facebook’s own ToS)." Of course, if that's true, I'm not sure if Power.com has the standing to make that claim. Wouldn't that be an issue for the user to raise themselves? Besides, I don't think there's any rule that even if a site lets you retain the copyright on content that it needs to make it easy to access. So now we have lawsuits coming from both sides that don't make much sense. The two sites should just learn to play nicely with each other.

Permalink | Comments | Email This Story
]]>seems-like-a-tough-claimhttps://www.techdirt.com/comment_rss.php?sid=20090710/0222325507Wed, 10 Jun 2009 21:21:00 PDTCan Scraping Non-Infringing Content Become Copyright Infringement... Because Of How Scrapers Work?Mike Masnickhttps://www.techdirt.com/articles/20090605/2228205147.shtml
https://www.techdirt.com/articles/20090605/2228205147.shtmlmade any sense. Power.com tried to aggregate various social networking accounts in a single place, so you could manage them all at once through a single interface. Yet Facebook charged the company with all sorts of complaints, including copyright and trademark infringement, unlawful competition and violation of the computer fraud and abuse act. Power.com asked for the case to be dismissed, but last month the judge sided with Facebook, but did so in a troubling way, by basically suggesting that since Facebook's terms of service prohibited these uses, it made it copyright infringement. Michael Scott points us to lawyer Jeff Neuberger's take on the ruling, and separately Tom O'Toole has a good analysis of the ruling. Neuberger states the following:

Judge Fogel concluded that the allegations of the complaint made out a sufficient claim of copyright infringement because Power Ventures "need only access and copy one page to commit copyright infringement." The court also found that the ToU prohibited downloading, scraping or distributing content from the Facebook Web site content except that belonging to the user, and that in any event, using automated methods, i.e., "data mining, robots, scraping, or similar data gathering or extraction methods" to access any content were also prohibited by the ToU. Thus, the court found that the allegation that Power Ventures accessed Facebook via automated means constituted made out a claim of direct copyright infringement, while the allegation that Facebook users utilized the Power.com interface to access their own profile pages made out claim of secondary copyright infringement.

Thus, because the terms of service said you can't do any automated scraping of the site, it's suddenly infringing? Even worse, the court found that even though the data being used by Power.com isn't owned by Facebook (it's the users') the scraping was still copyright infringement, because in order to scrape the non-infringing content, Power.com had to first "scrape" the whole page. O'Toole explains:

OK, so far the court has found that Power.com made unauthorized copies of the Facebook Web site. What about the fact that Facebook does not own the copyright in its users' profile data? Facebook surmounted this hurdle by arguing that the content of the Facebook page that surrounded the user's data is copyrightable and is owned by Facebook. According to Facebook, the Power.com scraper operated in a manner that required it to copy the entire Web page in order to extract the user's profile data....

Note that the court is conditioning its ruling on the assertion that the Power Ventures scraper necessarily copied the entire Web page before it processed the page and extracted the profile data. That comports with my (limited) understanding of how a Web scraper works. But is it true? If it were true, couldn't an argument be made that this is a fair use of the page? I'll leave that for better lawyers.

All of this seems a bit troubling, as it would effectively rule out scraping even non-infringing content, just because the scraper had to first read through copyrighted content to get to the non-infringing stuff. But, that seems to go against the entire purpose of copyright law. The fact that the scraper reads copyrighted content shouldn't mean that it's infringement. It's not doing anything with that content other than using it to find the content it can make use of. Anyway, this ruling probably doesn't mean all that much, since it was just to reject the dismissal request, but it does seem odd that the judge gave so much weight to Facebook's terms of service, and seems to indicate the mere act of scraping can be copyright infringement.

Permalink | Comments | Email This Story
]]>this-seems-troublinghttps://www.techdirt.com/comment_rss.php?sid=20090605/2228205147Wed, 22 Apr 2009 04:39:00 PDTNew Consortium Says If Others Can Monetize Better Than We Can... We Deserve Their Money?Mike Masnickhttps://www.techdirt.com/articles/20090421/1319244600.shtml
https://www.techdirt.com/articles/20090421/1319244600.shtmlsilly it is to be worried about various spam/scraper sites that take content from sites (including ours) and repost it on their own. Those sites never add any real value, but just repost the content. They get no significant traffic and retain no real audience. They tend to come and go pretty quickly. Worrying about them is a total waste of time (time that can be used making sure your own site is more valuable). Yet, apparently a group of publishers has put together a "Fair Syndication Consortium" that has decided that rather than go after these sites directly, it will simply try to get the ad networks that serve ads on such sites to hand over some money to the original content creators. As far as I can tell, that's basically the content creators saying "well, if others can monetize our content better than we can, we deserve some of that cash."

That makes no sense to me. If you can't monetize your own content better than other sites, you don't deserve to be in business. If other sites are actually getting traffic and ad revenue that you think you deserve, it means you're doing a bad job giving people a real reason to visit your site and to interact with your community. Simply demanding money from the sites that have done things better makes no sense. Of course, the reality is that most of these sites haven't done things better, and don't make any money. So the whole grandstanding seems rather wasted effort.

Focus on making your own site worth visiting. Stop worrying what others are doing with your content.

Permalink | Comments | Email This Story
]]>please-explainhttps://www.techdirt.com/comment_rss.php?sid=20090421/1319244600Fri, 8 Aug 2008 11:11:38 PDTAirline Plans To Cancel All Flights Booked Through 3rd Party WebsitesMike Masnickhttps://www.techdirt.com/articles/20080808/1057031933.shtml
https://www.techdirt.com/articles/20080808/1057031933.shtmlnot to be listed on the sites where people search for airfare, and easyJet's plan to sue the sites that send it customers, but Irish-based airline Ryanair is taking this all to a new level. Beyond just being upset about those 3rd party sites (i.e., sites that send it business!), it's planning to cancel the flights for everyone who booked through one of those services (thanks to Sean for the link).

Yes, we understand that these airlines prefer people to purchase flights from the airlines directly, but it still seems bizarre to try to cut off a great promotional channel. People already know to go look at 3rd party sites for airfare, so actively working against having your flights promoted doesn't make much sense. Then actively pissing off a bunch of your customers who booked through those sites by canceling their flights is even more braindead, as you've just formed a huge group of customers who will complain about your airline and spread the word about how you canceled their legitimately purchased flight for no reason other than spite and a confusion over business models. When Ryanair started promoting how some of its seats might come with sexual gratification, I'd bet many passengers didn't realize it would end with them getting screwed.

Permalink | Comments | Email This Story
]]>piss-off-your-customers-much?https://www.techdirt.com/comment_rss.php?sid=20080808/1057031933Tue, 29 Jul 2008 01:50:25 PDTThe Airlines' Ongoing Struggle With Price Aggregation SitesTom Leehttps://www.techdirt.com/articles/20080725/1322411794.shtml
https://www.techdirt.com/articles/20080725/1322411794.shtmlIt's proving pretty difficult to figure out exactly what happened between American Airlines and Kayak last week. Last Wednesday TechCrunch reported that American Airlines was pulling its listings from the airfare search engine. Comments left by Kayak's CEO Steve Hafner and VP Keith Melnick chalked the split up to Kayak's display of AA fares from Orbitz: American had demanded that Kayak suppress the Orbitz listings, and Kayak refused.

Presumably one of two things is making American want to avoid comparison to Orbitz prices: either, as TechCrunch speculates, users clicking the Orbitz option put AA on the hook for two referral fees -- one to Kayak and one to Orbitz; or AA has struck a deal with Orbitz that provides the latter's users with cheaper fares than can be found on aa.com.

Either way, the news doesn't appear to be as dire as it first sounded. It doesn't seem that AA flights will be disappearing from Kayak -- it's just the links to buy them at aa.com that will go missing. As Jaunted points out this might wind up costing flyers a few more dollars, but it shouldn't be a major inconvenience for Kayak customers.

The more interesting aspect of this episode is how it reveals the stresses at play in the relationship between the airlines and travel search engines like Kayak. It's no secret, of course, that the airlines are having a rough time as rising fuel prices put even more pressure on their perennially-failing business model. But while an airline attempting to control the distribution of its prices is nothing new, one can't help but wonder whether ever-narrowing margins might lead to a shakeup of this market.

Kayak, like most travel search sites, gets its data from one of a handful of Global Distribution Services: businesses that charge airlines a fee to aggregate price and reservation information. Some airlines, like Southwest, opt out of the GDS system in order to avoid those fees. Others, like American, participate in the system but try to send as much online business as possible to their own sites. Presumably each airline tries to find an equilibrium point at which the business brought in by participation in a GDS and the payments associated with it add up to the most profit.

But so long as the financial temptation to retreat from the GDSes persists, GDS data will be less than complete. And that creates an opportunity for another kind of fare-aggregation business -- one based upon scraping the data from the airlines' websites. It's been done before, after all, albeit on a limited scale. And since most people recognize that prices can't be copyrighted, there doesn't seem to be any legal barrier stopping such an aggregator from stepping in (nothing besides the need to write a lot of tedious screen-scraping software, that is). Though, of course, that won't stop airlines from suing, but the legal basis for their argument seems pretty weak.

Whether such a business is likely to emerge and succeed, I couldn't say. But it does seem certain that as fuel prices rise we'll be seeing more and more travel industry infighting -- and more and more hoops for online fare-shoppers to jump through.

Permalink | Comments | Email This Story
]]>airlines-vs.-aggregators?https://www.techdirt.com/comment_rss.php?sid=20080725/1322411794Fri, 4 Jan 2008 15:20:07 PSTJust Assume Any Info You Put Online Is PublicTom Leehttps://www.techdirt.com/articles/20080104/151042.shtml
https://www.techdirt.com/articles/20080104/151042.shtmlcharacteristically insightful post on the Robert Scoble/Facebook story. But Facebook and screen-scraping are two of my favorite things to talk about, so I can't resist pointing out that I disagree with some of Julian's analysis.

Having noted that a script acting on Scoble's behalf can only access information that Scoble himself can reach manually, Julian argues that this can't be considered the only criterion in evaluating the situation:

[P]rivacy is not just a function of the publicity of your personal information, but of the searchability and aggregability of that information. Public closed-circuit surveillance cameras, for instance, typically capture the same information that a casual observer on the street is already privy to. But we recognize that being spotted by diverse random pedestrians, or even being captured on diffuse and disconnected private security cameras, is not intrusive in the same way as being captured on a citywide surveillance system that is searchable from a centralized location.

All of this seems true: individuals' attitudes about privacy are rightly driven by a pragmatic appraisal of the likelihood of someone doing something bad with the available information — a judgment based on the information's value and the cost of obtaining it. Ripping up your credit card statement before throwing it in the trash doesn't make it impossible for a dumpster-diving thief to target you, but it increases the difficulty of ripping you off enough that you'll probably be safe.

But I think Julian makes a mistake when he assumes that this is a viable way to conduct your life online. The problem with applying this approach to an digital context is that a user's estimation of the accessibility of a given piece of online information is almost invariably going to be too low — and will be getting more so by the second. The costs to automatically collecting data are very small and getting smaller.

There are a few reasons for this. First, the tools are getting better. Libraries like WWW::Mechanize are simple for any programmer to use and available in a varietyoflanguages. And GUI-based applications like Dapper and Piggy Bank aim to make things even simpler. Second, if done properly, it's very difficult to prevent, detect or punish automated data collection. Facebook's script detection technology is impressively existent relative to that of its competitors, but it's still almost certainly trivial to subvert it with proxies, faked user agents and plausibly human delays. Third, once the data is collected it can, of course, be easily distributed.

And the situation is only going to get worse! In fact, it's getting worse at such a rapid rate that counting on the privacy of any even slightly public online information is a mistake.

The negative reaction to Scoble's script is coming from users who think of it as a violation of the covenant they perceived to surround their data. But that covenant was based upon their own mistaken understanding of the internet. Scoble's actions shouldn't be viewed by these users as a transgression against them, but rather as a pleasantly benign lesson.

It's fine to lament the situation, or to applaud Facebook for taking steps to keep its valuable, freely-acquired user data away from competitors (and, while they're at it, script-employing users). But this assertion of community norms is unlikely to stop those who, unlike Scoble, are genuinely acting in bad faith. The technology for containing digital cats in digital bags is woefully inadequate, and it's unlikely to improve anytime soon.

Permalink | Comments | Email This Story
]]>welcome-to-the-new-worldhttps://www.techdirt.com/comment_rss.php?sid=20080104/151042Thu, 3 Jan 2008 15:11:26 PSTIs There A Conflict Between Open Social Graphs And Your Privacy?Julian Sanchezhttps://www.techdirt.com/articles/20080103/124455.shtml
https://www.techdirt.com/articles/20080103/124455.shtmlRobert Scoble has apparently been barred from Facebook for running a script from Plaxo to export his relationship information (or "social graph," as the kids say), in violation of the site's terms of service. On one read, this makes him a martyr to the cause of open social graphs. I'm a bit more ambivalent.

Intuitively, it makes sense for users to be able to make whatever use they please of information about their own social networks. But in a social network, "your" information is someone else's as well. And on a site like Facebook, much of that information will have been provided in the context of a set of individually calibrated privacy controls, by people who expected it to be used in that context by a limited audience. Exporting that information without permission, then, raises important privacy questions.

Within Facebook, users have a fair amount of control over who can access what information about them. I can choose to block particular users on Facebook, rendering myself wholly invisible to them, as though I weren't even on the network. I can decide how much of my profile information will be visible to friends, to people who live in my region, to the general Facebook membership, and to the Internet at large. I can even decide how aggressively public, so to speak, such information will be. Lots of Facebook users are happy to let friends view their relationship status, but disable those status notifications in their news feeds, to prevent everyone they know from being simultaneously blasted with the news that "Bob has gone from being in a relationship to being single." Automated data collection "liberates" information from those constraints, possibly against the wishes of the people who provided it.

It's true that a script can only sweep up information that would already have been visible to a particular user anyway. But privacy is not just a function of the publicity of your personal information, but of the searchability and aggregability of that information. Public closed-circuit surveillance cameras, for instance, typically capture the same information that a casual observer on the street is already privy to. But we recognize that being spotted by diverse random pedestrians, or even being captured on diffuse and disconnected private security cameras, is not intrusive in the same way as being captured on a citywide surveillance system that is searchable from a centralized location. By the same token, I may be unhappy with the possibility of someone forming an external public database full of data I've freely shared with more narrow communities—personal, regional, or whatever.

None of this is to deny the initial intuition that it's desirable for users' social graphs to be portable to some extent. But as with all forms of intimacy, openness and privacy complement each other: We feel free to share information about ourselves to the extent that we have some assurances about how that information will be used. So while it's one thing to argue that Facebook should enable greater openness or portability in some particular way, subject to user control, it seems like quite another to criticize them for enforcing a rule about indiscriminate automated data collection.

Permalink | Comments | Email This Story
]]>what-about-your-friends?https://www.techdirt.com/comment_rss.php?sid=20080103/124455Wed, 10 Oct 2007 09:41:00 PDTAP Sues VeriSign For Copyright Infringement; Mostly PointlessMike Masnickhttps://www.techdirt.com/articles/20071009/185531.shtml
https://www.techdirt.com/articles/20071009/185531.shtmlsuing VeriSign's Moreover for copyright infringement, though the details are woefully unclear (even in the AP version of the article). It's unclear if the complaint is over the fact that Moreover scrapes and links people to AP content, or if there's something else going on (Update below). If it is just that Moreover is pointing people to AP content, then this is quite ridiculous -- but most likely driven by the AP's ability to get Google to pay up for the same thing. The article hints that there may be a bigger problem with Moreover providing the full text of the content, but details are lacking. If it is true that Moreover provides the full text -- then they probably are violating the copyright. However, if that content is simply stored away for indexing purposes, and people are sent to legitimate AP sources, then it's hard to see how this is a copyright violation at all. If anything, it's the opposite -- pointing more people to AP content. The AP is also complaining that Moreover lists the AP as a news source on its site -- but that's just a petty complaint from the AP. Listing out news sources hardly is a violation of trademark. Hell, the AP is a "news source" for the content we write here on Techdirt all the time -- and there's nothing wrong with saying so. All in all, unless more details prove otherwise, this sounds like the AP continuing to struggle with the changing marketplace it's facing, and lashing out at one of the companies that helps deliver more traffic to AP content for not paying the AP for the privilege of promoting AP content.

Update: Rafat Ali from PaidContent stopped by in the comments to point to the full lawsuit documents, posted on his site. From there, it appears that the AP's lawsuit is mostly ridiculous, with just a little tiny bit of reasonable thrown in. Most of the claims are about the fact that Moreover is spidering and scraping AP news feeds, and providing both free and paid subscribers headlines and the opening lede. However, it's pretty difficult for the AP to make a copyright claim here, since those links are almost definitely fair use, especially since they point people to legitimate AP licensees. There's a little gray area where Moreover indexes and caches the articles on its own servers -- but Google has been doing that for years without much of a problem -- and if the AP is really upset about it, there's always the old robots.txt solution. The one area where the AP may have a claim (though, the evidence does not seem clear from the exhibits) is in saying that there are times when Moreover will show subscribers a full AP article hosted on its own servers, rather than passing them through to a licensee. If true, then that would likely be copyright infringement -- though the "damages" would be minimal, if anything. Finally, the claim that this is an AP trademark infringement by listing the AP as a news source seems laughable. All in all, the original assessment stands: this is the AP unable to adapt and lashing out at those who are helping to promote their content.