June 30, 2007

Greetings. In the wake of my recent posting regarding Wikipedia and the Benoit murder/suicide case, I've received a number of responses that boil down to: "Why are you blaming Wikipedia for anything relating to this situation? Wikipedia isn't supposed to be authoritative."

I definitely agree that in a perfect world everyone would understand that Wikipedia is not authoritative -- and cannot be under its current structure.

But in the real world, Google searches on a vast array of topics will return Wikipedia articles as the top or near top results (and/or in other contexts), and a vast number of sites use Wikipedia entries as convenient explanatory text or links -- despite most WP entries' lack of attribution, lack of documented fact checking, and being subject to mutation and alteration at any time. But Wikipedia entries are free, they're easy to link to, and hell, if any particular Wikipedia page is wrong at any particular moment, people can always say "it's not my problem."

Unfortunately, it is not necessarily obvious to many Web users following such links -- or reading related excerpted texts -- that Wikipedia articles "aren't supposed to be authoritative." Many people who find their way to Wikipedia items or texts don't know what Wikipedia really is about, and many persons understandably assume it's like any other "real" encyclopedia (that is, authors attributed somewhere, facts get a modicum of checking at least most of the time, entries aren't subject to random editing on a whim, etc.)

The Wikipedia folks created the system under which they operate. They need to take some responsibility when that structure causes damage. This isn't the first example of Wikipedia abuse screwing around with people's lives.

I am frankly very tired of hearing some people use the Internet as an excuse for anonymous attacks and abuses, with it seems relatively few persons having enough guts to take responsibility for the impacts that then result.

We want to let people post anonymously, at least the pseudo-anonymity (subject to tracing in many cases) offered by the Internet? Fine. Anonymous speech definitely has its role. But the buck has to stop somewhere, and these systems should not be an excuse for a hit and run mentality.

In most such cases a significant amount of the responsibility when damage occurs must rest on the publisher of the unattributed information, if they have voluntarily chosen to operate in that manner. I'm not talking about common carriers and ISPs. I'm referring to sites that set themselves up in a way that serves to isolate posters/editors of material in public forums from attribution.

Again, if you want to operate this way, that's a perfectly valid choice. But realize that you're transferring part of the responsibility onto yourself. I do not believe that as a society we can accept the premise that anonymous systems erase all aspects of responsibility from all involved parties.

In the current Benoit situation, I likely wouldn't throw the book at that hoax poster. It's easy to be suckered in by the "devil-may-care" attitude that Wikipedia tends to foster. The hoaxer didn't realize that, in this case, they were falling into a serious and painful trap.

June 29, 2007

Greetings. After causing law enforcement and the news media to spin their wheels uselessly, a Wikipedia user has apparently confessed to planting a rumor as fact on the Wikipedia page for wrestler Chris Benoit, claiming his wife was dead hours before the bodies of Benoit and his family were found.

The ease with which this was done by a still anonymous party, triggering investigations and consternation at a time that was already intensely emotional for everyone involved with the Benoit case, demonstrates once again a fundamental flaw in Wikipedia's usually anonymous, non-moderated editing framework for most Wikipedia pages.

The fact that such editing can usually be undone (and redone later for that matter) doesn't change the fact that Wikipedia can never be an authoritative source while it is subject to this kind of anonymous abuse -- whether by jokesters out to get their kicks or well-meaning contributors simply unwilling to check their facts. Such events can easily turn Wikipedia pages into rumor and defacement billboards rather than encyclopedia-quality content. The damage is already done.

If Wikipedia expects to really be taken seriously in the long run, it needs to rethink its standards for item creation, modification, and attributions.

Wikipedia, it's time to grow up.

--Lauren--

Blog Update (June 29, 2007 18:22 PDT): Law enforcement officials now know the identity of the Wikipedia user involved in the situation discussed above, and have noted that, "It is unbelievable what a hindrance this has put on our investigation."More details here.

June 25, 2007

Greetings. There's a fascinating and apparently singular page on Google that you've probably never seen. In the normal course of searching on Google you'd only find it if you followed an unusual "sponsored link" -- sponsored by Google itself -- above the regular search results for a single, very ancient word. The page URL itself is nondescript and seemingly generic, but the contents are remarkable, for they are explicitly a blanket apology for many of the query results returned for that one very specific short word. Here is the page.

Search on the same word at Yahoo or AltaVista, and you won't find a similarly placed explanation or apology, even though the search results are similar.

The laudable presence of this Google explanatory page makes explicit an acknowledgment by Google that search results can have real impact on real people, and that the referenced Web sites in these results may at times be misleading, defamatory, or otherwise seriously damaging to actual lives.

I discussed some issues related to the possible ways in which search engines could help mitigate the impact of serious "attacking" misinformation on Web pages referenced by search results. In my view, the key aspect of this problem is finding a way to empower people who are being seriously demeaned, defamed, threatened, or otherwise attacked by specific Web sites -- sites and pages that are frequently beyond the targets' financial or jurisdictional ability to impact, even with court orders in hand. Real people, real lives.

It's known that search research is increasingly looking at how human input beyond inter-page linking activities can be usefully harnessed toward improving the relevance of search results, and it can be reasonably argued that such mechanisms may also be useful to help deal with serious Web site abuse, with distributed, virtual community input as a particularly intriguing possible approach.

I do not view such postulated Web dispute resolution mechanisms -- which I have broadly termed "dispute links" -- as a means for furthering arguments between creationists and evolutionists, political battles, or other "general" disagreements. Rather, the threshold for activating such systems would likely be fairly high, and focused on very specific attacks -- especially on individuals.

Ultimately, we must consider whether the status quo, where the targets of serious, life-ruining Web-based attacks are often essentially impotent to effectively respond, is an ethically acceptable situation. Will our ethical systems rule the machines, or will we allow the machines to reduce our lives to the lowest common denominator of automaton-like, void existence?

To be sure, these are complex matters, and if anyone tells you that they have simple answers for such questions, you're either talking to a lier or a fool. Even setting forth the fundamental precepts toward solving such problems is very difficult. Workable, possible solutions will be philosophically and technically challenging. I won't even try to offer my take on the more technical details here -- I have a white paper in progress that I'll offer to the community for dissection and possible evisceration as soon as possible.

In the meantime, it might be wise to muse more on that Google page noted above. For it tells us very plainly that among major search engines, Google understands that Search Results Matter. They matter now to everyone who uses the Web, and even to people who don't have Internet access at all -- but whose lives are impacted by the Web nonetheless. And that's the entire population of the planet.

The Web, after all, isn't really computers and routers, fiber and spinning disk arrays, databases and blogs. The Web is people. Our job now is to find the path toward helping make sure that the power of Web search enhances people's lives while not incidentally creating asymmetric opportunities for seriously damaging innocent lives in the process.

Even though a single search query word is explicitly referenced by that Google explanation page, Google has with that very clear published text already gone a step beyond other search engines in its acknowledgment of search impacts.

It therefore seems reasonable for the community to look toward Google, as the industry leader in search technology, to also be a leading force at forging that path toward the next steps -- the route that will help keep search engines as tools to benefit us all, while preventing that technology from being perverted by outside players for evil purposes.

June 17, 2007

Greetings. In a very recent blog item, I discussed some issues regarding search engine dispute resolution, and posed some questions about the possibility of "dispute links" being displayed with search results to indicate serious disputes regarding the accuracy of particular pages, especially in cases of court-determined defamation and the like.

While many people appear to support this concept in principle, the potential operational logistics are of significant concern. As I originally acknowledged, it's a complex and tough area, but that doesn't make it impossible to deal with successfully either.

Some others respondents have taken the view that search engines should never make "value judgments" about the content of sites, other than that done (which is substantial) for result ranking purposes.

What many folks may not realize is that in the case of Google at least, such more in-depth judgments are already being made, and it would not necessarily be a large leap to extend them toward addressing the dispute resolution issues I've been discussing.

Google already puts a special tag on sites in their results which Google believes contain damaging code ("malware") that could disrupt user computers. Such sites are tagged with a notice that "This website may damage your computer." -- and the associated link is not made active (that is, you must enter it manually or copy/paste to access that site -- you cannot just click).

Also, in conjunction with Google Toolbar and Firefox 2, Google collects user feedback about suspected phishing sites, and can display warnings to users when they are about to access potentially dangerous sites on these lists.

In both of these cases, Google is making a complex value judgment concerning the veracity of the sites and listings in question, so it appears that this horse has already left the barn -- Google apparently does not assert that it is merely a neutral organizer of information in these respects.

So, a site can be tagged by Google as potentially dangerous because it contains suspected malware, or because it has been reported by the community to be an apparent phishing site. It seems reasonable then for a site that has been determined (by a court or other agreed-upon means) to be containing defaming or otherwise seriously disputed information, to also be potentially subject to similar tagging (e.g. with a "dispute link").

Pages that contain significant, purposely false information, designed to ruin people's reputations or cause other major harm, can be just as dangerous as phishing or malware sites. They may not be directly damaging to people's computers, but they can certainly be damaging to people's lives. And presumably we care about people at least as much as computers, right?

So I would assert that the jump to a Google "dispute links" mechanism is nowhere near as big a leap from existing search engine results as it may first appear to be.

In future discussion on this topic, I'll get into more details of specific methodologies that could be applicable to the implementation of such a dispute handling system, based both within the traditional legal structure and through more of a "Web 2.0" community-based topology.

But I wanted to note now that while such a search engine dispute resolution environment could have dramatic positive effects, it is fundamentally an evolutionary concept, not so much a revolutionary one.

June 15, 2007

Greetings. I'd appreciate feedback from the Internet community regarding the following issue.

Search engines have of course become the primary means by which vast numbers of people find all manner of information. For many firms, if you don't have a high rank with Google, it's as if you don't exist (or at least, many companies appear to feel that way).

Increasingly, cases are appearing of individuals and organizations being defamed or otherwise personally damaged -- lives sometimes utterly disrupted -- by purpose-built, falsified Web pages, frequently located in distant jurisdictions. Search engine results are typically the primary means by which such attacks are promulgated and sustained by providing a continuing stream of viewers to those Web pages. Due to ranking algorithms, attempts to counter such attacks with other Web pages may not be widely seen since they are not directly associated with the attacking pages.

Courts appear generally reluctant to order offending Web page take downs in such cases, except where intellectual property (e.g. DMCA orders) are involved, and take downs do not necessarily inform viewers of the ongoing controversy in a logically connected manner. Additionally, "remedies" that result in suppression of information, rather than providing additional information, are generally ineffective and counter to the "open information whenever possible" philosophy that many of us share.

Question: Would it make sense for search engines, only in carefully limited, delineated, and serious situations, to provide on some search results a "Disputed Page" link to information explaining the dispute in detail, as an available middle ground between complete non-action and total page take downs?

Search engine firms have generally taken the view that they are akin to telephone directories, and bear no responsibility for the content of the pages that they reference. Similarly, when ostensibly aggrieved parties approach these firms with concerns about "offending" pages, the usual response is that the search firms can do nothing about those pages, and that any complaints need to be taken to the Web page owner or associated ISP. From a practical and jurisdictional standpoint, this turns out to be impossible in many cases.

We clearly do not want to hold search engines responsible for other sites' content, even when locally cached. To do so would likely obliterate the entire search engine model and industry under a storm of litigation, to everyone's detriment. It must be noted, however, that increasing calls for holding search engines responsible in just such a manner are being heard in some political and judicial circles, likely out of frustration with the status quo, which currently tends not to offer reasonable dispute resolution paths in most situations. This is a serious warning sign, suggesting that we should consider some new approaches on our own, or risk draconian and damaging legislation.

The telephone directory argument also has some problems. Unlike typical phone books, search engines are not passive publishers of information. In addition to third-party ads tied to the core listings, a key facet of search engines is intensive ranking and decision-based ordering of content listings, usually via highly proprietary algorithms. Such ranking provides a high percentage of the value-added represented by search engine results.

So while search engines are not responsible and should not be held responsible for the content of the outside pages and data they index, they are very much directly involved as decision-making gatekeepers (albeit, usually through fully automated algorithms) that determine to a major extent which individual Web pages are likely -- or unlikely -- to be discovered by Internet users.

More questions: Given the power that search engines possess in these regards, do they bear any responsibility for helping to untangle serious disputes regarding the pages they reference and often profit from? If search engines do not voluntarily move in this direction, do they risk damaging legislation written without a genuine understanding of the complex technical and business issues involved?

In my view, an evolution by search engines to deal with these situations should be predicated on that key concept of maximizing the availability of information. Page take downs -- which are likely to be ineffective in the long run as noted -- should be a last resort. Similarly, a total laissez faire approach is also unlikely to be tolerated indefinitely by the political and judicial establishments.

So returning to where we started... Could some sort of "dispute link" -- tied directly to information regarding particularly serious page disputes -- provide a reasonable means to help ameliorate these situations without risking the more destructive alternatives? If so, how would such a system be effectively implemented in a practical fashion? How could such a system be structured to avoid being swamped by relatively trivial complaints?

Would providing related dispute links only to persons with court orders make sense to limit potential abuse of the mechanism, or would requiring the use of the expensive and delay-prone courts be far too restrictive a qualification? Could such a dispute system operate purely on a voluntary basis? (Voluntary would be very much preferred in my opinion.) What are the cost factors involved in such a system and how could they be reasonably addressed?

Overall then, is it possible to structure such a system along these lines so that it is practical, workable, and also palatable to the major search engine firms, as an alternative to barreling along toward an onerous and likely politically motivated crackdown down the line?

Or would this concept just never work -- and that crackdown is inevitable?

June 14, 2007

Greetings. News stories (such as this example), are appearing widely about an AT&T plan to try block pirated content at the network level.

The implications of this sort of network snooping are immense. One might assume that a primary target will be file sharing technologies. But to actually pick out particular content from those streams would imply the need to actually examine and characterize the payload of files to locate and block potentially offending music and/or video content.

AT&T will no doubt suggest that this activity is akin to virus and spam filtering of e-mail for their customers. This would be a specious analogy. Spam filtering can usually be controlled by the user, and virtually all AT&T mail processing can be avoided by their customers if AT&T servers are not used.

However, it sounds as if AT&T is planning a network monitoring regime that would not be dependent on the use of AT&T servers. What's more, the "benefits" of this monitoring would not be directed to the customers whose traffic is being monitored, but rather for the benefit of unrelated third parties.

"Fingerprinting" of content for anti-piracy purposes is not always unacceptable. For example, Google/YouTube is reportedly starting tests of a copyrighted material characterization blocking system. Since users submitting videos to YouTube are doing so with the expectation of that content being hosted there, it is not unreasonable for YouTube to avoid hosting pirated materials whenever practicable.

However, AT&T's proper role in this context (among an ever smaller number of ISP choices) is simply to move customer data traffic between points, not to be a content policing agent for third-party commercial interests, or a mass data conduit for government interests without appropriate legal authority, for that matter. The traffic under discussion, based on news reports about the AT&T plans so far, would typically not be directed to AT&T servers, and should not be subject to content inspection by AT&T, in the absence of specific targeted court orders or the like.

We can get into a discussion of if and how common carrier considerations play into any of this anymore, and how encryption (and attempts to control and suppress encryption) will enter the mix, but the very fact that these AT&T plans have gotten this far is extremely disturbing.

Finally, perhaps the most illuminating aspect of this situation is a statement by James W. Cicconi, an AT&T senior vice president, who is quoted as saying that AT&T wouldn't look at the privacy and other legal issues involved until after a monitoring technology has been chosen.

June 11, 2007

In both public and government circles, concerns are rising regarding important aspects of Google's ongoing operations. Some of these concerns are very real, and some are more a matter of perception than reality -- often magnified simply because Google is involved. In either case, the situation is exacerbated by the extremely limited opportunities for the public to interact directly with Google in a meaningful way regarding increasingly sensitive matters that can have highly personal and very widespread impacts.

A dedicated, at-large, public ombudsman to deal with these issues is urgently needed at Google, to interact directly and routinely with the public regarding Google, YouTube, and other affiliated operations.

The privacy, content-related, and many other concerns of ordinary users and organizations, expressed to Google through currently available feedback channels, appear to routinely vanish into what is effectively a "black hole" -- with a lack of substantive responses in most cases. If you don't have a court order or a DMCA "take down" notice, Google can appear impenetrable to expressed concerns.

Privacy International's reported inability to receive a response to their queries prior to the release of a new report regarding Google privacy is but one example of a seemingly pervasive situation at Google. I won't present here a critique of that report itself, but it's clear that both individuals and organizations commonly feel impotent when attempting to resolve many important issues with Google directly.

In general, both politicians and government agencies appear increasingly unsatisfied with this status quo, and their reactions could be extremely damaging to Google and the broader Internet.

I'm not suggesting another Google counsel. The ombudsman would have a role wholly different from that of Peter Fleischer's Global Privacy Counsel position, or Nicole Wong's Deputy General Counsel role. In fact, this would likely not primarily be a policy "development" role per se, though policy evolution over time would of course be significantly involved.

The ombudsman would be a non-lawyer who would be assigned full-time to act as an easily approachable and highly available front-line interface between the public and Google operational/R&D teams. This individual would be the primary initial contact for most queries from individuals and organizations who have specific problems related to Google content, privacy, or a range of other related policy matters. This technically knowledgeable individual would be well-versed regarding the relevant issues and ideally already possess a high degree of trust within the larger Internet community.

Such an ombudsman, by fostering open lines of communications, could immediately interact with members of the public and push relevant matters quickly up the chain of command inside Google for action as appropriate.

There's simply no legitimate excuse for a public communications void of such a magnitude at this stage of Google's development, especially with an organization of Google's size, market share, influence, and immense technical competence. At a minimum, ordinary Google users should be able to get quick, reliable, and substantive responses and resolving dialogue for their Google-related concerns, even irrespective of any final dispositions.

Communication is incredibly important in this sphere. The current situation is seriously and increasingly dangerous to Google. Backlash and reactive, knee-jerk legislation by ambitious politicians could easily unreasonably constrain and seriously damage Google, the broader Internet, and Net users around the world.

A Google at-large ombudsman along the lines that I've outlined could be the best and most practical way to help avoid such negative outcomes, while not disrupting Google's operations and growth. It would most decidedly not be an easy job for anyone, but would be an important position that definitely needs to exist.

I make this recommendation with what I believe are the best interests of both Google and the Net's users in mind. I want to see Google continue in its success. But a regulatory and public relations train wreck -- with major collateral damage across the Internet -- is increasingly likely unless serious and comprehensive improvements in Google's handling of this area are forthcoming in the extremely near future.

The appointment of a qualified and dedicated ombudsman, with the sincere support and confidence of Google high-level management, could go a long way toward making Google an acknowledged leader in responsive operations, to the benefit of us all.

Of course, it's not impossible that this call for a Google ombudsman will itself be ignored by Google. But in the final analysis, we can all hope that Google management will realize that creating this position is very simply the right thing to do.

June 07, 2007

Greetings. Get ready New Yorkers, his honor Mayor Bloomberg is pushing ahead to impose London style "congestion pricing" for Manhattan, with the approving nod of the feds.

Such a system, which has already helped to turn London into the "Big Brother" capital of the western world, entails placing hundreds of cameras to read vehicle license plates as they enter the "designated" areas of the city. Vehicle owners are then charged for the privilege of driving in those locales.

Of course, such a system is also dandy for building and maintaining a massive database of driver activities for a range of other purposes. This is likely (regardless of any claims of data privacy) to become fodder for all manner of officials and clever attorneys -- just as "FasTrak" toll data in the San Francisco Bay Area already has. But with the NYC system, you won't have any way to "pay cash" and avoid being tracked.

If New York City proceeds with this plan, you can bet that other U.S. communities will consider following suite as quickly as they can pour concrete for the camera mounts.

This is the kind of invasive technology -- with massive "data creep" potential -- that privacy-conscious people should really be concerned about today, not services like Google's existing Street View application.

June 04, 2007

Greetings. The Google Street View controversy hasn't even had time to cool off yet, but today we have some commentators musing about the desirability of major censorship of Google Earth and similar services. You see, word is that the wannabe terrorists who were talking about blowing up JFK Airport fuel supplies were making extensive use of Google Earth imagery, as do millions of law-abiding non-terrorists of all stripes, of course.

Calls for massive imagery censorship, presumably to blot out every conceivable terrorist target from the public's online view, have a certain appeal among those who always view the Internet and most of its users with suspicion. The logical outcome of this reasoning could vastly alter Google's imagery data storage requirements -- removing enough photos to make the lords of censorship happy would reduce the Google Earth file system to something akin to a single "404 Not Found" page.

Let's get this straight. While there are admittedly a very limited number of extremely highly sensitive locations for which censorship of satellite imagery at Google Earth resolutions can be justified, attempts to extend such imagery blocking to broadly cover possible terrorist targets would not only be ineffective at its stated purpose, but actually a potential disaster for public safety.

Ineffective -- since there are myriad sources for photographic data relating to the vast number of sites -- from shopping centers to power substations and transmission lines, from schools to chemical plants, that might potentially be targets. Most of these are subject to much more detailed photography by average citizens who have ordinary access to the "targets" and environs. Surprise, removing images from Google Earth doesn't make the locations themselves vanish!

Trying to hide most such images can be a public safety disaster in much the same way as overblown efforts to block the public's access to so-called Critical Infrastructure Information (CII). All too frequently, these are merely convenient cover-ups for sloppy and dangerous operations that themselves put the public at immediate safety risks that are far more likely than the theoretical risk of terrorism at such locations.

Given the world we live in, it would be ludicrous to argue for total public access to all information. On the other hand, there are those whose fear of information would turn much of the Internet into that dead-end "404" -- under government edict. Attempts to link Google Earth with the JFK plot fall firmly into this category.

Absolute safety, or absolute government power? There's really no difference.