The slides from the talk are available online — use the arrow keys or the navigation controls in the bottom-right to move between slides, or press the space bar to enter the overview and navigate more quickly.

Occupy Our Data: visualising a movement through social media

Part One: research goals

Occupy is, to the delight of some and the frustration of others, a movement largely characterised by the social media response. By the time of the Zuccotti Park eviction in November, Occupy was being mentioned over a quarter of a million times a day on Twitter, and it’s in a league of its own in terms of tweeted-about social movements in the English-speaking world. Regardless of what the actual relationship is between Occupy (the movement) and #Occupy (the hashtag), it’s obvious that a signficant amount of discussion is taking place online, and in public. But what’s the importance of these conversations to us as activists and academics?

Firstly, relationships between different social media actors can identify key subjects for interview in research, and give some kind of “fingerprint” of the media response to an event. This gives us a starting point for comparing and contrasting the coverage of multiple events. We can also use this kind of relationship-based approach to analyse the relationship between social and traditional media in driving forward the narrative of an event, answering the question of which followed which. There is already some work in this area, for instance the Numeroteca newspaper front page / Twitter mash up (see slide), but we can go further with this: a good understanding of how media have interacted with each other in the past is useful for concocting future media strategy.

Secondly, Occupy is largely un-planned and de-centralised: a consequence of this is that there is no central point of record. Of course, a central record is not without problems, but it does leave us relying on autonomous action to write the (a?) history of the movement. The traditional format for distributed record-keeping has largely been academic publishing (for the moment leaving aside mainstream media, because of the profit incentive, and grassroots blogging, because of its relative incompleteness), but there are some limitations even here. Books and even journal papers take a while to publish, and once they exist, there’s no obvious pathway for that work to enter the public domain. It should hopefully be clear that it’s not appropriate to rely on copyrighted, immutable, and sometimes opaque analysis for the narrative record of our movement. So, setting the parameters of our solution of choice, we want a forum to edit shared knowledge collaboratively, accurately and fairly — intuitively, this is starting to sound like the goals of the Wikimedia foundation (the group behind Wikipedia) and, encouragingly, the first link in a search for “Occupy timeline” is the Wikipedia “Timeline of Occupy Wall Street” page. Clicking on the page, however, reveals a small pile of fairly unstructured information: it’s obviously not a comprehensive timeline of events (or even major events), it’s hovering on the brink of deletion as Wikipedia editors lock horns over whether the page should even exist and the whole thing is out-of-date, to boot. The question of why Wikipedia’s answer is so limited merits a whole talk by iself, but one factor is certainly Wikipedia’s hard rule on citation: no “original research”, no blogs, mainly media articles and academic papers. There are valid reasons for these restrictions, but they do have a chilling effect.

So, what about another solution? Occupy has been around for months, and there’s a wealth of talent and energy associated with the movement: surely some enterprising person has started work on a shared history of an important social movement? It turns out that, with one caveat, the answer to this is “no”. Existing Occupy-related web projects seem mainly geared towards organising General Assemblies, facilitating the ongoing functions of working groups and communicating about actions; in short, goals that serve the movement’s future and present, but not its past. Apparently we are all too busy to take notes… The New York-based Activist Archivists group (and the somehow-related Occupy Archives Working Group) are doing amazing things with digital and physical record-keeping, but it’s too large a job for one team, and their mission is maintaining the artefacts of the movement, not necessarily the narrative of events linking them.

The consequence of these limitations and gaps is that it’s far too easy to make deliberate or accidental over-sights, whether the subject is an academic, an activist (attempting to plan future actions) or a journalist. Occupy has enough of a communications problem already without having to engage in serious head-scratching every time we need to refer to the past. Thus, another project goal is to provide a platform for a collaborative shared history of Occupy, using key public “media objects” to build up a rich timeline.

In summary, there are three main things we’re trying to do:

Access & analyse the vast online presence of the Occupy movement. Interfacing with social media in this way is not a new problem, but it’s not yet one that has an obvious solution (particularly for our needs)

Try and draw some conclusions about relationships and causality from this data.

Use media objects as a baseline to start work on a shared history.

Part Two: data problems

There are several challenges with the data — we haven’t (yet) had to deal with serious data-cleaning, but there are plenty of other things to worry about, primarily:

Huge data volume. OK, it’s not DNA sequencing, but 13,000 tweets in 24 hours is still a lot (particularly given that’s only for one hash-tag).

Limited metadata available

YouTube is important to us (and getting YouTube data was the original motivation for this project): the Occupy Research Demographic and Political Participation Survey revealed it to be the most-used social media platform behind Facebook (which is almost impossible to scrape). YouTube certainly collects excellent data, including views over time, but much of that is only available to the video uploader, and yet more is only available via the website (not the API).

Similar situation with live-streams; sites like Ustream and Bambuser do have “archive” sections, but there’s no motivation for them to make those records complete or useful; their value proposition for users is live broadcasting.

Aside: unsurprisingly, when you turn to companies to provide a public service, they fall short. Eventually the profit motive is going to cause a company’s interests to conflict with the needs of activist users of its services, whether that comes in the form of Facebook-style rolling over for law enforcement, live-streaming services’ poor facilities to keep records or even an entire business model changing, as is the case with Twitter. It’s obvious that they’ve decided that the path to monetization lies in advertising, and for that to be effective they need control over the applications used to interact with the service: this means closing and limiting their API, and that’s exactly what they’re doing. In 2012, Twitter’s search feature goes back 9 days — this is technologically embarassing, but it’s not a technically-driven decision. All of this means that we need to rely on third-party services (we use Topsy), but they have their own conflicts of interest, and in any case it’s presumably only a matter of time before Twitter decides to assert authority over that kind of service, too. Whilst we still can (without losing huge chunks of our history) we should be moving to services specifically tooled for our needs, and not subject to the whims of some profit-seeking entity.

In addition to there being a general scale problem (see above), there’s also a relative scale problem. YouTube is the most-used network by participant (see above), but Twitter obviously dwarfs it in terms of the volume of output. How can we account for this uneven scale in our graphic?

Traditional wisdom: either treat tweets as more important than they usually are (narratives, like Storify) or as less important (aggregation / quantization, like the aforementioned newspaper front page visualisation). Rendering tweets textually is basically just the Twitter interface again (which obviously doesn’t scale to the numbers we need), and reducing them to their volume takes away our ability to show relationships, or drop back down to the underlying tweets.

Can’t we have the cake and eat it too? We can handle a lot of data with the right interface; we want to see volume and be able to follow conversations / drill down to particularly influential tweets.

Part Three: the tool

Before anything else, please note: our tool is still under heavy development; some kind of pre-pre-alpha. That said, we’ve started as we mean to continue and the code is already open source, on GitHub.

The general idea is to make something that’s part timeline, part node-link tree: a visualisation that can show relationships over time (because “relationships over time” is impact).

Currently we’re working on a prototype, trying to create a social media narrative of the Zuccotti park eviction and building the tools as we go. We’re automatically fetching YouTube videos and comments, uploading curated spreadsheets of mainstream media articles and fetching tweets featuring links to any of these resources. Importing bulk Twitter data also works (not shown in screenshot).

Obvious in the screenshot are several rough edges; timezones, management interface, etc.; this leads well into the question of “What next?”.

Finishing prototype development.

Research questions, starting with “what was the pattern of media coverage around the Zuccotti park eviction, and how can we use that information going forward?”

Long-term, we need to work out how on earth we can use this technology to help tell the story of Occupy, and we need to be fantastically carefuly about the position “we” are in whilst doing so.

Part Four: hierarchy online

In short, the internet magnifies the challenges in creating an anti-hierarchical organising process. The usual bodge is to keep the group small, and rely on friend networks to keep the group consistent. This approach doesn’t really work on the internet, though: people expect better, and this invitation system doesn’t scale anyway: there’s no more trust between someone 100 degrees of separation from you, and a complete stranger.

A fundamental problem with the internet is that — even assuming everyone is being pleasant and personable — there is already a huge power imbalance along gender, language and economic lines (see a talk I gave on this last year for more). Occupy already struggles to create a reputation as a representative movement: how can this situation improve if all communication channels are dominated by people already in positions of relative power? We can mitigate some of these issues: multiple language support (planned) and excellent accessibility (OK, working on it). Some of the challenge is out of scope, at least in the short term. But what about the rest? What can we do to prevent another Wikipedia-like hierarchy of white male nerds?

And speaking of Wikipedia, unfortunately people are not always pleasant and personable online. Beyond the archetypal acts of “random vandalism”, Wikipedia suffers from a few common problems: personal disputes are transplanted from the outside world and enacted through the editing interface, activists from both sides of issues attempt to re-write articles to fit their point of view and, crucially, an entire industry of parasi PR consultants has sprung up to leverage Wikipedia to massage truth online. The project deals with these issues through a supposedly meritocratic culture; you are respected based on your contribution and ability. The drawback of this sysxtem is that you’re not just being judged on contribution, you’re being assessed on your apititude for some pretty technical writing skills, and your conformance to the Netrual Point of View, which is anything but.

So it’s an open question. How can we possibly devise a system that at once allows anyone to contribute, and attempts to challenge existing hierarchies in society? In short, how can we prevent Wall Street paying some intern to damage or pervert our shared history? We don’t yet know, but these are vitally important conversations that we need to start now, and continue in a permanent revolution of anti-hierarchy.