Data riot?

I wrote earlier this month about various invocations of a DIY ethos in Digital Humanities work, and in that post I suggested that if we’re going to use punk metaphors then I want a DIY practice modeled on riot-grrrrl practices. I argued that this entails the creation of a “sophisticated DIY infrastructure that favors women – spaces, practices, active interventions that make it possible for women to enter into (DH) and promote themselves.” It would also necessitate a genuine feminist technosocial context within DH communities of practice.

In the intervening weeks I’ve found myself struggling to resolve the paradox created by a DIY framework – at the same time that it promises certain individual freedoms and celebrates the power of gumption and metaphorical elbow grease, it also places a large labor burden on the intrepid DIYer. Repeated conversations at various conferences about the “state of the field” and “the market” have emphasized that the dual path is still the route for tenure-track scholars (you still need that scholarly monograph, DIYer). Thus I find myself sighing at every exhortation that others should “do it themselves,” even as I work to find a space within the DIY metaphor for a more feminist methodology.

All of which has led me to think about data sharing. I mean data sharing on a really big scale – like a repository for humanities data. Now I’m new to “big data” in some ways, although, like most, I’m familiar with the distant readers and historical shift scholars that are out there. I’m used to long hours with single texts, hand-encoding, and thinking about interpretive markup. I’m particularly excited by the idea of “authorial” or “poetic” markup that Julia Flanders and I talked (my talking was virtual as voice-in-paper) about at the annual DH meeting in Palo Alto. My practice has been based around small data. Nevertheless, in thinking about scaffolding not just in the DH-inflected classroom, but also for scholars who are new to the field and excited, especially those at liberal arts colleges, I see real opportunities for shared, big data.

I also think I see ways in which a site for shared humanities data might be an important part of creating a “DIY infrastructure” that enables more feminist scholarship. Take a step back to my earlier riot-grrrl invocations and think about zines – the technology is fairly accessible through basic commercial outlets. Pen, paper, glue, a photo-copier (the history of how that last tool became physically and economically accessible is worth thinking about). Zines, as affordable, accessible, and fast productions enabled a range of interventions at very local and then more distributed levels. The speed and production ease of zines meant that they were effective tools for identity and community construction.

Shift frames to the model of scholarly text encoding that I’m used to: it’s slow, dependent on grant funding if one is to build up a reasonable sized textbase or a particularly lovely digital edition, and it requires some degree of expertise. As someone who teaches text encoding, I don’t think the expertise barrier is onerous. Nevertheless, I am acutely aware that this is not a zine-like nimble practice. Nor do the issues of funding and speed recede once you know how to encode.

What I’m working on understanding are the possibilities for open, big datasets to enable a more zine-like approach to DIY digital humanities. Beyond my research interests, I’m also intrigued by the possibilities of leveraging open data in liberal arts contexts, where the infrastructure to develop new data might be harder to come by. Because these are new thoughts for me, thoughts hatched in the early morning hours when a recently restless baby was still asleep, I don’t yet have conclusions, only questions:

When thinking about texts, what constitutes a “dataset”? A textbase of 2,000? 200? 2?

Does the management of distributed data with something like the grid-enabling software discussed by Mark Hedges risk losing the specificity of diverse data? Do we really want to “hide the idiosyncratic heterogeneity”?

Are there ways to move back forth between the idiosyncratic and the generalizable?

Are the problems of the archive – deterministic identification, silent exclusions, etc, the problems of data repositories as well?

Can a feminist project use such data without succumbing to a “master’s tools” kind of problem?

can a humanities data repository offer materials for mashup/remix in a way that lowers the barriers to participation?

What kinds of education about the presence of this infrastructure would we need? “Bring me my scissors!”

Could a data repository allow for a range of feminist interventions? After all, riot-grrrl has been critiqued as too white, too middle class, and too western.

Can big data resources be leveraged for small work? What if I want just one text from your textbase – can I do that too?