Archiving Usenet: Adopting an Ethics of Care

“On the Internet, nobody knows you’re a dog.” Most folks have no doubt encountered this adage, coined in a 1993 New Yorker cartoon, through one of the many, many cultural riffs and references, or maybe in a reproduction of the original cartoon. The idea, of course, represents public perceptions about anonymity, privacy, and the internet prevalent at the time of its publication: that one’s online and offline presences could be largely disconnected from each other.

When the cartoon was first published, the sentiment certainly seemed more likely to be true in theory (though not always in practice). Particularly throughout the 1990s into the mid-2000s, the internet was thought to be a safe space for engaging in a variety of identity play, and transgender individuals were uniquely poised to benefit. One’s offline identity was not always tightly bound to their online presence, certainly not closely as social network sites like Facebook might wish them to be—a change reflected in a 2015 follow-up cartoon of the dogs reminiscing about their prior anonymity. Online, trans individuals could take steps to disconnect their offline selves from their online identities, where they might adopt different names and gender identities that better reflected their own self-understanding. While I didn’t identify as transgender at the time, I nevertheless engaged in these practices myself as a teenager, often failing to ‘correct’ individuals who, presciently, assumed I was male.

However, my online life at the time was entirely pseudonymous, and I made sure to keep a certain distance between my offline and online selves. This has allowed me to keep my prior online activities (as well as my past opinions on the state of the World of Warcraft endgame) largely divorced from my current online presence. Other individuals, particularly early users whose online access came through an employer or university, may not have been able to maintain such a clean separation. Bits of one’s offline identity—elements of a legal name used for official company email address, differing names between those used in messages and those attached to email accounts, or an “official” email signature—remained connected to online activities, including posting to Usenet. For trans individuals, these traces can reveal distinctly gendered or pre-transition names, employment, or activities they might otherwise wish was not widely known.

As I get closer to a launch-ready version of the Transgender Usenet Archive, much of my attention has been focused on thinking through my ethical responsibility to these users. At the core of the project are two impulses. On one hand, I hope to increase the accessibility and reach of an important, if undiscussed, part of recent transgender history. As a consequence, however, I am giving these posts a new kind of visibility beyond the initial level of access (which, admittedly, you can already get through the Google Groups archive). Given this increased access, I am also deeply invested in conscientiously respecting not only posters’ agency as authors, but also their privacy as individuals, who may have treated their posts as ephemeral communications, not meant for academic analysis.

Because there’s not a lot of guidance for working with Usenet materials, I’ve looked to other instances where archivists faces similar concerns. Tara Robertson’s writing on the ethical implications of Reveal Digital’s scanning and posting of the On Our Backs backcatalogue (since taken down) speak compellingly to the importance of thinking carefully about consent, representation, and digital access. One difference between OOB and other digitized materials is Usenet’s status as the organizing umbrella under which a variety of public fora lived. Usenet newsgroups, and by extension users’ posts, were always ‘public’ in terms of accessibility. However, posts were not archived and made available on a mass scale until DejaNews started collecting them in 1995; the current Google archive, and thus the collections the archive is based on, are made up of what DejaNews collected, along with several other donated collections of pre-1995 material. Following DejaNews’s announcement, users concerned about privacy successfully advocated for DejaNews to adopt the the “X-No-Archive” header, which signaled a post shouldn’t be archived. However, DejaNews’s choice to respect users’ wishes to XNAY (for X-No-Archive: yes) their posts was voluntary—a policy Google (which acquired DejaNews in 2001) has continued to follow to this day.

Nevertheless, the fact users had the option to XNAY posts when they were first written doesn’t guarantee they would want their posts to be publicly available now. With contemporary indexing and archiving tools, what might have seemed “privately public” in 1997 now can be made, in incautious hands, all too public. With some fairly simple Python scripts, I’ve been able to collect, count, and index thousands of user names and emails, including building a whole network of users’ communication.

The Google Groups archive has functionally performed such indexing on a massive scale, making all of these posts (and their attached content, some of it clearly not intended for such a mass audience) available to anyone who wishes to access it. Individuals can request for archived posts be removed, but the process for doing so is opaque at best. As Andy Baio rightly notes, Google’s primary interest here is not in in acting as a good steward of the internet’s past but in maximizing profitability. In a internet landscape dominated by social network sites (including Google’s underwhelming entry into the field, Google+), personal data mining, and algorithmic filtering, Usenet is neither ripe for personal data mining nor very profitable. In fact, it’s the exact opposite: an unstructured, decentralized system now best known as a resource for illegal file sharing. Thus, there appears to be little financial incentive to investing energy into the archive.

In her discussion of the impact of Reveal’s choice to make OOB widely available, Robertson makes it a point to connect this act with the people it most directly impacts: those in the photographs. In reaching out to these individuals for their reactions, her opinion shifts as a result of her own community membership, as “‘the community’ wasn’t an abstract notion, it was the people who gave me those generous quotes. I could see their faces and empathize with their fears and feelings that institutions had screwed them over again.” These moments, Robertson suggests, require archivists, librarians, and others to act with an ethics of care, which Bethany Nowviskie argues focuses a researcher or practitioner on two key areas:

“The first is toward an appreciation of context, interdependence, and vulnerability—of fragile, little things and their interrelation. The second is an orientation not toward objective evaluation and judgment (as in the philosophical mainstream of ethics)—not, that is, toward criticism—but toward personal, worldly action and response.”

I, like Robertson, am both a professional (academic researcher, in this case) and a community member, and these roles shape my thinking. While I’m interested in making these discussions accessible, I also want to recognize and respect their contextual particularities and constraints. Robertson suggests the Zine Librarians’ Code of Ethics as source of guidance, and I’ve drawn on it in designing the Transgender Usenet Archive.

In design, I’ve chosen to take several different steps to preserve individual privacy and encourage good, respectful practice. The archive will be publicly available to anyone who wishes to use it, but accessing the archive will require users to informally agree that they are agreeing to use it for non-commercial personal, teaching, learning or research reasons only. All of the posts included in the archive have been selectively indexed and do not include headers which might contain identifiable information, such as emails and names. However, I have not altered posts’ content in any way, so any message sign-offs and email signatures that were already included in posts will appear in the archive as is.

I’ve also manually removed any 64-bit code for images (such as personal photographs, etc) that include any possibly identifying features (such as full body or face shots); these images have been marked with <IMAGE REDACTED FOR PRIVACY>. There’s a long history of repurposing and reposting trans women’s photos online without their consent, and I don’t want to contribute to it through the archive. Because I can’t determine the particular provenance of these photos (especially given that many were attached to mass-mailed spam), I’ve chosen to err on the side of caution and redact these images.

Lastly, I want to do my utmost to respect and support posters’ right to refusal. Unfortunately, the scale and amount of content in the archive makes attempting to contact individual posters unfeasible. As part of the archive site, I’ll will be offering a contact form for individuals whose would like to inquire about if their posts are included in the archive. However, this post is meant to offer individuals a chance to let me know if they’d like their posts not to be included. Please feel free to reach out to me via email (adame@umd.edu) if you think your posts might be in the archive and would like them removed, or if you have any other questions or concerns.