Subscribe

April 2010

When I was working on natural language processing and speech recognition systems in the 90s, one of our mantras was "there's no data like more data", i.e., all things being equal, the accuracy of recognition tends to increase with the addition of more labeled data. The Linguistic Data Consortium at the University of Pennsylvania was [and, I suspect, still is] the primary source for labeled text and speech data, and it was available - for a fee - to all members, most of whom were researchers and developers in academia and industry. Three recent developments in the past week have prompted a reflection on the broader power of data ... and the people and organizations that have access to it.

Every public tweet, ever, since Twitter’s inception in March 2006,
will be archived digitally at the Library of Congress. That’s a LOT of
tweets, by the way: Twitter processes more than 50 million tweets every
day, with the total numbering in the billions.

We thought it fitting to give the initial heads-up
to the Twitter community itself via our own feed @librarycongress. (By
the way, out of sheer coincidence, the announcement comes on the same
day our own number of feed-followers has
surpassed 50,000. I love serendipity!)

We will also be putting out a press release later with even more
details and quotes. Expect to see an emphasis on the scholarly and
research implications of the acquisition.

On the one hand, I believe this is a very positive development. Google, Yahoo and Microsoft all pay for real-time access to the Twitter "firehose", and now researchers and developers with shallower pockets will be able to access the entire Twitter public data archive ... after some yet-to-be-announced delay (it's not clear when the archive will become available, how often it will be updated, or how often developers or their applications will be able to access it).

A related development, also announced during the recent Twitter's developer conference (Chirp), was that Twitter is offering a stream API to supplement its REST API and Search API. As with the other APIs, there are limitations imposed on its use, lest fail whales become a significantly more common sight, but this still represents a positive development in making more data more openly accessible.

I don't want to draw too strong of an analogy between private video rental records and public tweets, but given the broadening range of web services that enable people to automatically update their status[es] about their use of those services (e.g., Netflix users can automatically post their movie ratings on Facebook), I find myself speculating about how the Twitter archive might affect future judicial nominations and/or future elections for political offices ... but given my biases toward a more transparent society, I suppose that if the data is out there, I'd rather have it publicly available than have limited access to it.

And speaking of sharing updates and other data across web services, the second recent development in the realm of open data to give me pause were announcements at the Facebook developer's conference (f8) last week. VentureBeat's f8 roundup offers a nice summary of these announcements, which included a Graph API and a "like" button that can be used on any web site ... vastly increasing the prospects for personalization and sociality across the web ... and placing Facebook squarely in the center of this hyperpersonalized and hypersocialized network. Lili Cheng, of Microsoft's FUSE Labs, wrote about the first Facebook partnership announced - and demonstrated - during the keynote at f8, a new Facebook app for sharing documents created by her group.

As with the Twitter announcement, I see many positive possibilities in these developments, but I see an even darker shadow being cast by the Facebook announcements. Marshall Kirkpatrick at ReadWriteWeb articulated some of my concerns in a post asking Is the New Facebook a Deal with the Devil?

Facebook blew people's minds today at its F8 developer conference but
one sentiment that keeps coming up is: this is scary. The company
unveiled simple, powerful plans to offer instant personalization on
sites all over the web, it kicked off meaningful adoption of the
Semantic Web with the snap of the fingers, it revolutionized the
relationship between the cookie and the log-in, it probably knocked a
whole class of recommendation technology startups that don't offer
built-in distribution to 400 million people right out of the market. It
popularized social bookmarking and made subscribing to feeds around the
web easier than ever before. And it may have created the biggest
disruption to web traffic analytics in years: demographically
verified visitor stats tied to people's real identities. There was
so much big news that the analytics part didn't even come up in the
keynote.

This is so much new technology and it's tied in so closely with one
very powerful company that there is big reason to stop and consider the
possible implications. There are reasons to be scared. The bargain
Facebook offers is very, very compelling - but it's not a clear win for
the web.

1. An iFrame on sites that points to Facebook. The iframe request is
data loaded so it knows where the user came from. Facebook shows
activity and friends that have interacted with the site but the data IS
NOT shared. You have to be logged into facebook for it to work. It
LOOKs like it is on that site but it isn’t. It is a little window into
facebook on a different page.

2. Applications can ask users for access to their data through the
service formerly known as ‘connect’. Each and every user has to agree
to share the data. If you don’t want to share then don’t use the App.

Facebook isn’t doing anything differently then they did before, it is
just easier and more integrated.

Although a subsequent commenter posted an unsubstantiated and rather abusive allegation that Austin works for Facebook (Austin's username is linked to Aqumin, a financial data and analysis firm), no one rebutted his argument.

I discovered Dave's post via Tim O'Reilly's tweet, and as one of the post prominent proponents of the open web, Tim's endorsement carries a great deal of weight (for me). He also tweeted a link to another positive perspective on the Facebook announcements, by Fred Wilson, a partner at Union Square Ventures, who raised doubts about One Graph to Rule Them All?:

These other social graphs [Twitter, Tumblr, Foursquare, Disqus, GetGlue, and others (remember
del.ico.us?)] can and will grow in the wake of Facebook. I
am not sure if Facebook's ambition is to create the one social graph to
rule them all but if it is, I don't think they will succeed with that.
If it is to empower the creation of many social graphs for various
activities and to be in the center of that activity and driving it, I
think they are already there and will continue to be there for many
years to come.

And referencing Tim brings me to the third (and final) recent development I wanted to mention regarding open data: his keynote on where open source and open data are going in the age of the cloud at the 2010 O'Reilly MySQL Conference and Expo last week. Some of the issues he raised in his talk are reflected in a blog entry he posted last month on The State of the Internet Operating System (a "part 2" followup is promised soon). If I were to highlight one theme from the keynote, it is his statement that the future actually belongs to the data, not the database. I'll highlight a few of his more specific observations and insights below.

The 21st century data challenge is how to deliver algorithmic real-time
cloud-based intelligence to mobile applications.This cloud future includes...

Devices acting as sensors for intelligent data collection

Devices whose UI is on the web rather than the device

Feeding data into multiple online services that will turn into a full-on sensor web

Setting the stage for robotics, augmented reality and the next generation of personal electronics

The Internet Operating System is a Data Operating System:

It helps applications find out about

People

Places

Things

Prices

Documents

Images

Sounds

Relationships

...

and helps people interact with them through services

Search

Payment

Matching and Recognition

...

Referencing an earlier blog post on The War for the Web, Tim asked "Who will own the Internet Operating System? Do we want anyone to own it? If not, we better get busy."

Invoking concepts from Wall Street, via the Money:Tech conference ("Where Web 2.0 meets Wall Street"), and applying them to the prospects for the open web, Tim noted that some financial companies that started out as brokers started trading for their own accounts, against their customer, and warned us to watch for this behavior on the Internet: "The giants of the internet are trading for their own accounts, building a platform on which all roads lead back to themselves."

Noting that each of the players (giants) in "the Internet Operating System game" tends to embrace open source for their own strategic reasons and is giving away something that is valuable to someone else, Tim suggested that we may see "some interesting open source moves around Microsoft's Bing search engine", and offered a partial list of potential open source supporters in different application areas:

Non-proprietaryData is available in a format over which no entity has exclusive
control.

License-free Data is not subject to any copyright, patent, trademark or trade
secret regulation. Reasonable privacy, security and privilege
restrictions may be allowed.

Toward the end of his talk, Tim referenced a recent Radar O'Reilly blog post by Nat Torkington on Truly Open Data, in which Nat notes that we have to build some tools to support open data, e.g., tools for provisioning and tracking. In short, we need to make it as easy to share data as it is to share code in open source movement. So maybe a more appropriate title for this post would be "There's no data like more open data and tools" ... but I think I'll save that for a future followup post.

A few months ago, I wrote about the commoditization of Twitter followers, after discovering a number of automated, semi-automated and manual strategies that people - and non-human systems - were employing to artificially boost their Twitter follower counts. My earlier discovery was sparked by noticing some unusual numbers in the profiles of some recent followers of my Twitter stream. My latest discovery of yet another Twitter commoditization tool was similarly sparked by the profile of a new follower - who has since unfollowed me - that listed 1,983 followees, 787 followers and only 6 tweets. Clicking through to the Twitter homepage of this new follower revealed that 3 of these 6 tweets referenced TweetAdder, a tool that promises to "get more followers, instantly".

TweetAdder appears to be slightly less cynical than followe.rs, the fully automatic reciprocal following system I referenced in my earlier post, wherein new users who signup are automatically followed by all existing users, and automatically reciprocally follow all existing users. However, it does include the phrase "twitter follower bot" in the title field of the image used to promote the product.

TweetAdder, the self-proclaimed "Ferrari of Twitter Friend Adder and Promotion Software", is a semi-automatic follower acquisition tool, relying on the reflexive reciprocal "follow back" response exhibited by a signifcant proportion of Twitter users (TweetAdder claims that this represents 30%-50% of Twitter users). After purchasing the software, users need to spend some time with targeting Twitter users that they want to lure into reciprocally following them, e.g., by specifying keywords, locations and/or other Twitter users whose followers they want to reach. The software purportedly provides for automating tweets and direct messages ... I wonder if future versions will provide for automatic retweets of targeted prospective followers, as I imagine that would be an even more effective lure.

At first, I thought "well, at least this is not yet another Ponzi scheme", but then I found that TweetAdder offers an "affiliates program" in which users are purportedly paid $10 to sign up, 50% commission on direct sales referrals and 10% on affiliates' sales referrals. The TweetAdder purchase page includes an icon for the SC Magazine Awards 2009, "organized to honor the professionals, companies and products that help fend off the myriad security threats confronted in today's corporate world". However, searching for "tweetadder" and "tweet adder" on the SC Magazine site returned 0 results. If SC Magazine does write an article about TweetAdder, I wonder how they would portray the product.

As in my earlier post, I want to explicitly state that this post is intended as a critique, not an endorsement, of such automated Twitter follower acquisition schemes. I was surprised to discover that TweetAdder was endorsed in an NBC News piece by Mike Wendland on Handy apps to help manage your Twitter account. Immediately following a reference to "lots of tips and tricks and scams out there", Wendland says "The best tool I've found is a program called TweetAdder." The end of the piece includes a link to his web site and his Twitter handle (@pcmike). I wonder if @pcmike, who has approximately 6000 followees and 8000 followers, is a member of the TweetAdder affiliates program.

The mainstream media has given considerable attention to a recent Pew Center for People and the Press survey that revealed that Americans have an increasingly negative view of government (25% positive, 65% negative). I think it's important to note, in this context, that the same survey revealed that Americans have an increasingly negative view of the national news media (31% positive, 57% negative) ... and, somewhat ironically, a rather positive view of small businesses (71% positive, 19% negative) and technology companies (68% positive, 18% negative). Perhaps future surveys might break out a new category of "Twitter-based companies" or "social media companies".

Last night, I watched a disturbing show on PBS, Worse than War, "the first major documentary to explore the phenomenon of genocide and how we can stop it". Daniel Jonah Goldhagen, narrator of the film and author of the book upon which it is based, argues that contrary to common conceptions of irrational and spontaneous combustion as the cause of genocide, it actually involves careful planning by rational actors, beginning with the identification of a political objective - typically the removal or elimination of an ethnic group - followed by the persistent demonization and vilification of members of that group through violent and virulent communication and other acts.

Goldhagen proposes that genocide could be more properly characterized as eliminationism:

the belief that one's political opponents are "a cancer on the body politic that must be excised — either by separation from the public at large, through censorship or by outright extermination - in order to protect the purity of the nation"

In nearly every case, the international community did little to stop the atrocities, and many actions - and inaction - of members of the local and global community reminded me of the social roles involved in the circle of bullying I wrote about in my last post (Be Impeccable with Your Word: Confrontation vs. Condescension and Intimidation): bullies, followers or henchmen, supporters or passive bullies, passive supporters or possible bullies, disengaged onlookers, possible defenders and defenders.

One of the most disturbing segments of the film (starting around the 1:03 mark) showed U.N. Peacekeepers in Rwanda abruptly abandoning the Ecole Technique Officielle school in Kigali, in which they had been protecting thousands of Tutsi from homicidal Hutus, who immediately moved in and massacred the unprotected and unarmed Tutsi. Goldhagen claims that the one post-WWII example of significant and effective intervention, the 1999 NATO bombing of former Yugoslavia, resulted in Slobodan Milošević, leader of the Serbian eliminationists, quickly ceasing atrocities and coming to the negotiation table. He argues that the biggest obstacle to preventing genocide is the lack of the will on the part of world leaders.

Throughout the film, I was reminded of the concept of epidemic hysteria or Mass Psychogenic Illness (MPI) that I recently read about in Connected: The Surprising Power of Our Social Networks and How They Shape Our Lives. The authors, Nicholas Christakis and James Fowler, describe several instances of large-scale emotional contagion in which groups of people "catch" emotions from others through direct contact or observation over varying lengths of time. For example, in what has become known as the Tanganyika laughing epidemic, uncontrollable bouts of laughter lasting a few minutes to a few hours spread across a population of several hundred people during the first several months of 1962. Another, more recent, example was several waves of MPI at a high school in McMinville, TN, during 1998, in which gasoline was purportedly smelled and dozens of people suffered from symptoms of nausea and dizziness; no objective evidence of gasoline or any other physical agent that may have caused the symptoms was ever found. Several other examples are provided, but the important thing I want to note here is that the characteristics that tend to mark episodes of MPI include a highly connected community that tends to be isolated and/or stressed ... characteristics that appear to apply to most, if not all, of the groups of genocide perpetrators depicted in Goldhagen's film.

Toward the end of their book, Christakis and Fowler discuss the "interpersonal spread of criminal behavior as an example of a bad network outcome". As with other viral effects, people observing the commission of a crime - or perhaps its after-effects (e.g., the broken window theory) - may be more likely to commit crimes themselves. They note that "the riskier or more serious the crime, the less likely others are to follow suit (though there can be frenzies of murder too, as in the Rwandan genocide)." Unfortunately, in this context, they do not explore these more serious types of criminal frenzies further.

Another book that came to mind was The Lucifer Effect: Understanding How Good People Turn Evil, by Philip Zimbardo, which reports on - among other things - his [in]famous Stanford Prison Experiment, in which a group of college students were randomly partitioned into groups of prison guards and prisoners and placed within a simulated prison. The experiment, which was intended to last 2 weeks, was stopped after just 6 days due to the unanticipated ferocity and sadism with which the "prison guards" adopted and performed their roles, and the depression and other signs of stress exhibited by those playing the "prisoners". I haven't actually read the book, but based on the broader coverage described in its synopsis, I believe that it provides many insights relevant to the types of genocide - or eliminationism - described in Goldhagen's film, e.g., the strength of "situational power" and the effects of "conformity, obedience to authority, role-playing, dehumanization, deindividuation and moral disengagement".

I wish I could say that Goldhagen's film depicts atrocities beyond anything I could ever imagine happening in this country ... at least in modern times (slavery, the civil war, and other epochs in our history may represent approximations of eliminationism). However, the roots of all of the examples of eliminationism he examines are all preceded by periods of persistent demonization and vilification of classes of people ... practices that seem to be on the increase in some media pundits and channels. In researching this blog post, I was simultaneously heartened and disheartened to discover that I am not alone in this concern.

This Article proceeds from the assumption that—from a less lofty, more grassroots perspective—modern, organized, formal, one-time venues for extremist political speech do not present the most potent threat to physical safety and a stable democracy. The greater danger emanates from pervasive right-wing extremist themes on radio, television, and some online news sources (often as a modern-day replacement for hard-copy newspapers and newsletters). These media support an increasingly passionate and virulent message in public discourse. This message encourages persons who feel uneasy or displaced in society to expiate their grievances not through the political process, but through murder.

...

This Article addresses pervasive, long-term, mixed messages that blend ostensible news with entertainment, politics, religion, and appeals to ethnic identity and general fear-mongering. Although such discourse receives the greatest coverage in the mass media, the better forum to mitigate and neutralize the incitement to action may be on a person-to-person level. This Article will explore interventions in Rwanda and Nigeria that adapted American dispute prevention and resolution methods to African media and dispute resolution traditions. The African collaborations offer a different view of justice, based on relationships, which may provide a better fit and forum for America to address extremist media messages and their impact on society.

I hope, for the sake of all Americans, that we can learn the lessons from other conflicts, find common ground, foster more civil and respectful relationships, and avoid the kinds of catastrophes we have witnessed in countries that may be, in some key respects, not so different from our own. And I also hope that we can find and employ the will to use our considerable power to stand up to bullies in other parts of the world.