To Hell With Good Intentions: Linked Data, Community and the Power to Name

This is the written version of my keynote presentation from the 2015 LITA Forum in Minneapolis, Minnesota, on November 14, 2015. I am grateful for the thoughtful and critical feedback from my friends and colleagues Maureen Callahan, Jarrett M. Drake, Hillel Arnold, Ben Armintor, Christina Harlow, and Chela Weber in their review of earlier drafts of this text. My slides are also available.

Good morning. I am honored to be here in Minneapolis, and I have to admit it’s the first time that I’ve been able to attend the LITA Forum. Today I’ll be trying to provide some structure to a few longstanding thoughts that have been percolating about the work that we do with metadata, which I see as being at the intersection of libraries and technology, and the rhetoric around linked data in particular. My hope is that we can start to examine linked data, particularly within the context of cultural heritage, and how it is decidedly not neutral, nor an intrinsic good, but instead as another space in which ideology and systematic oppression are likely to be reproduced. It is my belief that we have the power and the obligation to intervene in this, and that part of that intervention can involve both an intercession into our professional norms and practices and getting out of the way enough for communities to determine how they want to best document themselves.

I want to start my talk by defining my own “personal graph”1 in relation to this presentation by acknowledging two things: the work and thought of others who inspire me and have helped me understand the provocation behind this presentation, and the context that I bring to this presentation. I am lucky to have a group of thoughtful colleagues who have helped me form my thoughts, whether they realize it or not. This is certainly not a list of everyone whom has ever inspired me to be an effective archivist and technologist willing to engage critically with the work that I do – it’s just the tip of the iceberg. Nonetheless, I would like to acknowledge the following people because of their profound impact on how this presentation has come into being. To Jarrett Drake, Christina Harlow, Benjamin Armintor, Bergis Jules, Ed Summers, Tara Robertson, Shirley Lew, Baharak Yousefi, Alison Macrina, Amelia Abreu, Maureen Callahan, and Hillel Arnold – thank you. And, if you’re a conference organizer, please consider one of them as a potential keynote speaker in place of someone who looks like me or talks like me. We might speak about similar things but we don’t always agree with each other, which makes it more interesting.

So, let’s talk about context. This is not to acknowledge my degrees and professional accolades, but rather to lay bare who I am and how some of the topics I’ll be navigating speak to me. I am called to do this in part because of my recent experience at the Digital Library Federation Forum in Vancouver. From the opening session and keynote presentation by Safiya Umoja Noble, to the concluding plenary panel, “Capacity and Community: Setting Agendas for #ourDLF,” the discussion of acknowledging the context that people, as well as institutions, bring to their professional lives and projects percolated throughout the conference. This in part relates to the need to acknowledge the potential bias that we bring to our professional projects, and to allow our potential audiences to identify points of commonality or discord.

So, who am I? Professionally speaking, I have worked in a variety of institutions, with an overwhelming focus on archives, metadata, and technology; the overwhelming majority of those institutions have been well-resourced and well-respected by people who find that important. I have been a project archivist, an assistant archivist, an application developer, an adjunct professor, a contractor, a digital archivist, and now a director of technology. I have worked with lots of metadata and have had lots of experience thinking about, implementing, and maintaining linked data and RDF over last couple of years. Generally, I think that unfettered access to information is a good thing. Nonetheless, my professional experience is an odd one, because I’ve found myself between worlds, dwelling in a liminal space between archives and technology. At least in the past, the relationships between archivists and technologists in my experience have never been smooth, and throughout my career, I have been given the message by one group that I didn’t really belong there since I was perceived to be part of the other one. IT folk have seen me as the demanding archivist who probably, in their mind, understand the complexity of their work. Archivists have labeled me as decidedly not an archivist, a data peddler, the resident geek, and someone unqualified to work in archives. In all fairness, this has gotten a lot better over the last few years – both for myself and the professions in general. Regardless, this is part of the reason why I say that I’m both an archivist and a technologist, even if at times I’m one moreso than the other.

This is a feeling I have to acknowledge, because it is so familiar in a personal context as well. It starts with my family. I am a child of two lovely people, an American and a first-generation immigrant who came to the US to further his education. My father, who’s now been in the states for most of his life, seemed intent on minimizing difference between my family and everyone else around us. So what? Assimilation is common. Throughout my life, however, this difference seemed to be unavoidable. My skin is relatively pale, and I am deeply aware that on a daily basis that I benefit from and am expected to contribute to the economy of white privilege. Nonetheless, I can’t shake that liminality. By some people’s read, I’m White; by others, I’m Latino. I don’t know what I am supposed to be, but I know I’m both. I also know that the way that people look at me in public changes when I’m by myself versus when I am with my father in the US, or when my mother and I are with him and his family when we visit them in Peru.

And such it is with other aspects of my identity. In the interest of time and perceived vanity I’m not going to itemize this for you further, but if you want to know, you can ask. Regardless, just look at me. I’ve got this white skin, and this masculine body and presentation. I have a reasonably secure income with a job I love that compensates me fairly. I am in a committed heterosexual partnership. I am not here to whine about myself precisely because I have a ton of privilege. Nonetheless, the context you should take with you is that identity - or your story - is a hard thing to assert when you have no control of how you will be read. People see what they want to, and elide or emphasize that difference. It is no surprise that this is painful and tiring. Sometimes the world drowns your voice, and more often than not, the voices in your head can do that too.

The other thing that weighs heavy on my mind, and adds to the context of my presentation today, is the stark reality that students of color are facing at University of Missouri, and other campuses across the country. It is an unfortunate truth that systemic racism and disenfranchisement are not news, but specific incidents are what draw attention. And such it is with what’s been playing out on the Mizzou campus. The racism experienced by people in Columbia, Missouri - itself less than five hundred miles from Minneapolis - is not new. These are all part of a broader series of events and historical realities, with a campus that was likely built on the labor of slaves. And even more recently, the opposition is nothing new; it’s part of a larger context.

I want to acknowledge these larger struggles even from the last three months at University of Missouri. Graduate students at MU organized, participated in a walkout, and begin unionizing in response to cuts to health care coverage for domestic and international students and to address other aspects of their working conditions, including better wages, tuition and fee waivers, and access to housing and childcare. There were three “Racism Lives Here” rallies this fall on the Mizzou campus even before the campus discovered the hateful graffiti in a residence hall bathroom. In October, three immigrant students filed separate lawsuits against the University of Missouri, St. Louis Community College, and the Metropolitan Community College in Kansas City. The suits were filed in reaction to the recent rewording of Missouri House Bill 3 from the 2014 version, which prohibited students with an “unlawful presence” from receiving in-state tuition rates, to students having an “unlawful status.” Two undocumented MU students that were not publicly identified would have seen a $15,000 per year tuition increase based upon this change. The University of Missouri Health System drastically reduced access to women’s healthcare through the ending of “refer and follow” privileges and the cancellation of agreements that allowed graduate students to receive training at Planned Parenthood of Kansas and Mid-Missouri’s clinic in Columbia. I recommend looking at this timeline published by The Maneater, a student newspaper at University of Missouri, which allows you to start to see this greater trajectory.

Across all these aspects we see the interplay of issues of justice related to race, ethnicity and immigration status; to access to education and healthcare and the economic issues entangled within; and to gender. What speaks to me the most about this is the consciousness of the activists involved to want to ensure they are represented accurately and fairly. Specifically, I’m talking about the conscious decision to develop a “no media” safe space in an encampment set up on a campus quad, and the intentional reaction to eject an MU student who was working on assignment for ESPN.com who was trying to take photographs. To the Concerned Student 1950 movement, to all students, communities, and activists recognizing the need to protect themselves and designate that space for healing, fellowship, and organizing: bravo, I stand with you. To those of you decrying this as an assault to free speech, I offer the following inspired by Ivan Illich: to Hell with good intentions. This is a theological statement. The media will not help anybody by its good intentions.2

On Thursday, Tressie McMillan Cottom, assistant professor of sociology at Virginia Commonwealth University, wrote a blog post in response to a Twitter conversation she had with Roxane Gay and David Simon. Her post dismantles Simon’s assertions “that the Mizzou students were fascists in ‘intent’, the photographer was the real hero of recent events, and that these were the moments on the slippery slope to the decline of American democracy” through acknowledging the complicity of the media as being neither a rational, objective actor nor a neutral presence. Specifically, she asserts the following early on in her post:

The press is not a rational objective actor. The press shapes as much as it documents. All press benefits as much from social change as it benefits from the status quo. That means the press, especially corporate media, is always serving two masters. The press has rights but so do persons and sometimes we define those rights by working through the moments when they clash.3

I found out about Tressie’s post from my colleague Jarrett Drake, who posted a series of tweets that really blew my mind with his framing of the post. As an aside, I have to acknowledge that the process of how I learned and interpreted this itself was extremely serendipitous and haphazard. Jarrett was providing his own gloss on a blog post where Tressie was writing about an hours-long Twitter conversation. Jarrett tweeted the following:

Okay, read this and insert “the archives” where you read “the press” or “the media” and watch it still make sense.4

Especially this line: “The press shapes as much as it documents.” “The archives shapes as much as it documents.”5

The archives as shapers of the past, not merely documenters of it. I hope someone dissertates on this.6

To build off Jarrett’s reading of that critical sentence, I’ll ask you to do the same as he did. For “the press” or “the media,” replace that with “libraries.” Replace it with “library technology” or “library systems.” Replace it with “metadata.” Replace it with “linked data.”

I cannot help but emphasize how much this framing has helped me make sense of this presentation, particularly when read alongside “Locating the Library in Institutional Oppression,” an article that nina de jesus wrote for In the Library with a Lead Pipe, which demands that we see the complicity of libraries and librarianship in maintaining a perception of neutrality. Bess Sadler and Chris Bourg’s article in the Code4lib Journal, “Feminism and the Future of Library Discovery,” urges us to question our supposed professional neutrality in the context of developing systems and services to support discovery in a library context, and how to imbue a feminist agenda into not only how we think about our work, but also into those systems and services themselves. Bess and Chris acknowledge the work of Hope Olson, in particular her book The Power to Name: Locating the Limits of Subject Representation in Libraries, as establishing that subject classification systems are bound to their historical context and reflect bias.

More generally, Olson also establishes that subject classification itself is a type of “naming” information, or the creation of document surrogates. She continues (emphasis added):

Naming is the act of bestowing a name, of labelling, of creating an identity. It imposes a pattern on the world that is meaningful to the namer. Each of us names reality according to our own vision of the world build on past meanings in our own experience. … Dale Spender was speaking in general terms, but might have been describing the way librarians name information when she wrote: “All naming is of necessity biased and the process of naming is one of encoding that bias, of making a selection of what to emphasize and what to overlook on the basis of a strict use of already patterned materials.”7

Arguably, by extension, this also includes the creation of descriptions, or metadata about “real-world objects” in the parlance of RDF: books, places, people, topics, and so forth. It should be obvious that naming is power, as the title of the chapter of Olson’s book from which the previous quote is taken. Geographer Yi-Fu Tuan further acknowledges this in the the specific case of geographical names, and how naming itself can be used to construct an convenient thing that previously did not exist, most specifically in the construction of “Asia”:

We may trace the continent in its present shape and size back to the end of the seventeenth century, when modern Western people felt the need for a collective name to designate their own society and culture. … “Europe” came to be seen as the handy term with which to describe a geographical area and an assortment of peoples, which, by the late seventeenth century, did have a large measure of unity in linguistic and civilizational origin, in physical (racial) type, and in religion. Asia, then, was defined negatively as all that was not Europe. Asia’s reason for existence was to serve as the backward, yet glamorous because exotic, Other. It had no independent reality; and yet, in the course of time, people who lived in this European creation began to accept it and exploit the name of Asia, and the sociopolitical reality it could call into existence, for their own purposes.8

If we return to the world of libraries and the management of metadata, we can notice a refraction of related issues. Building on Hope Olson, Chris Bourg identifies cases relating to subject classification in her talk “Never neutral: Libraries, Technology, and Inclusion” she gave at the OLA Superconference earlier this year. She relates a discovery where Randy Shilts’ book Conduct Unbecoming: Gays & Lesbians in the US Military, by virtue of the call number assigned to it, was shelved with works related to “Minorities, women, etc. in armed forces,” which let to the book being, in her words, “literally shelved between Secrets of a Gay Marine Porn Star and Military Trade – a collection of stories by people with a passion for military men.” She also acknowledges the recognition of Myrna Morales in discovering that materials related to the Young Lords Party, a Puerto Rican nationalist and activist group, as being classified under a subject heading of “gangs” within the Reader’s Guide to Periodical Literature.

Amber Billey, Emily Drabinski, and K.R. Roberto critically examine RDA Rule 9.7, which instructs catalogers to record gender with one of three options – “male,” “female,” and “not known” – as part of the process for constructing an authority record.9 Billey, Drabinski, and Roberto acknowledge that from the perspective of queer identity and lives that it erases or oversimplifies the reality of how people view their own gender. Specific issues, in their view, are threefold: that, contra RDA Rule 9.7 and the Library of Congress interpretation of that rule, gender is not reducible to a binary and innate state; that the rule fundamentally misunderstands how queer, transgender, genderqueer, and gender non-conforming people understand their own identities; and that gender can fundamentally be “read” by a cataloger. Most troubling is the recommendation in Rule 9.7 that gender “changes” be recorded associated with a given date. Billey, Drabinski, and Roberto note this practice as “insensitive at best, painful at worst, and belies, the often decidedly non-linear paths gender changes can take.”

It should be clear that the naming function of metadata raises a contentious point in that it allows assumptions and oppression to be reproduced over time. That is not to say that metadata and the process of naming does not have any positive value within the context of libraries and cultural heritage institutions. Structure and conformity – the basis of standardization – allows us to build systems that support discovery at all. Even a consistent locally-defined or discipline-specific vocabulary provides the potential for marginally better discovery insofar as you understand the context enough. Moreover, naming is fundamentally unavoidable in knowledge representation. As such, we need to make a decision whether we choose to name with an intention of justice, or with the pretense of neutrality and objectivity. Karen Coyle, in the first chapter of Understanding the Semantic Web: Bibliographic Data and Metadata, her ALA Library Technology Report, also notes that metadata should be defined as “constructed, constructive, and actionable”; in other words, metadata is a manufactured artificiality, developed for specific purposes, and can be used to satisfy a particular need. In my read, this further emphasizes that by definition metadata, nor its creation or interpretation can never be neutral, and it is incumbent upon us to recognize the damage we can or have inflicted on the communities represented, absent, or served by our institutions and collections.

So, if this is generally true for metadata, and if you’re an astute observer or maybe just opinionated you’ll probably guess this is probably true for linked data as well. If you’re not familiar with it by now, linked data relies on the standardized identification and naming of things – which again, include people, books, and so forth – in a global context using aspects of the Web as part of its core architecture. Specifically, and depending on how dogmatic you are, the preferred mechanisms are to use HTTPURIs for naming, and RDF as the underlying model for how you define the world. Linked open data assumes that your data is, well, open: in other words, publicly accessible, and ideally licensed for reuse.

In a library context, then, we can see that there are any number of entities we can consider – people, organizations, bibliographic works, cultural heritage objects, concepts, places, and so forth. The presumption is that Linked open data in libraries can hence help improve discovery – and, fundamentally, that this process of discovery is understood to be positive. In particular, this acknowledges one value of linked data for libraries in that it makes information more accessible to the Web. With a specific angle, this value is communicated by publishing linked data for libraries using Schema.org. The Schema.org initiative was originally started by Google, Bing, Yahoo, and Yandex to “create and support a common set of schemas for structured data markup on web pages.”10 In turn, the intent is for Schema.org to assist with discovery on the web, as it can be leveraged as part of search engine optimization.

I am less convinced that this is really a revolutionary thing when you do this by itself, without critically examining the rest of the practice. Yes, our resources should be broadly discoverable even on the broader web. This value of SEO within the context of linked data and libraries really doesn’t change much about how the Web operates nor does it demand that we change our processes of naming. I think those of us who work at the intersection of libraries and technology are directly responsible for the implementation choices that we make, and by merely opting into linked data in this manner we are trafficking in a Web that is built by corporations who are opting out of this responsibility. That lack of inaction is a conscious choice that allows for searching the Web to remain fundamentally undemocratic as described by Safiya Umoja Noble. In Dr. Noble’s words,

[N]ot all organizations have the ability to promote their URL via other media. One of the myths of our digital democracy is that what rises to the top of the pile is what is most popular. By this logic, sexism and pornography are the most popular values on the Internet when it comes to women.11

Another specific set of foundational concepts within linked data is ripe for questioning and critical analysis, and this set of concepts has two heavily interrelated aspects. The first aspect is the open world assumption, which is a theoretical premise underpinning linked data. The open world assumption states that the truth value of a statement may be true irrespective of whether or not it is known to be true. As such, it emphasizes the fact that no single person or agent has comprehensive knowledge, and accordingly, we are limited about what we can infer from that knowledge to which we have access. The complementary aspect to the open world assumption is the view that “anyone can say anything about anything,”12 which over time has been changed from the original claim, to “anyone can make simple assertions about anything,”13 to “anyone can make statements about any resource.”14,15

These concepts are heralded by a handful of people as the power of linked data as a means for us to shift the narrative, to make it easier to assert truths as we know them. By publishing them as linked data, there’s also the possibility that someone else can pick up these narratives, read and interpret them, and help promulgate them further. In particular, the work of Tim Sherratt, an Australian historian and digital humanist comes to mind. In a presentation to the 2012 National Digital Forum in Wellington, New Zealand, Tim stated the following:

But to really have access, for something to be truly open, people also have to have the power to create. To take what they’re given and build something new — to challenge, to criticise, to offer alternatives. That means allowing people the space to have ideas, giving them the confidence to experiment, providing useful tools and the knowledge to use them.

This is precisely where I think we can really make ourselves useful, but first, we need to step back a little. We need to begin having some serious conversations about how we can best serve our communities not only as repositories of authoritative knowledge or mere individuals who work within them. We should be examining the way in which we can best serve our communities to support their need to tell stories, to heal, and to work in the process of naming. Part of that involves knowing how to engage in these conversations and asking how we can help instead of constructing a representation of the worst that is best flawed and perhaps unrelatable, and at worst, knowledge organization that inflicts or reproduces violence on those whom we intend to serve.

In addition, we have to recognize the folly of imposing our good intentions in regards to the production of linked data, or any form of documentation, without listening to these communities. These spaces are not always ours, and like the students occupying the quad at University of Missouri, we should be ready to make the space they demand when they do so. Even when we directly engage members of a community and request their presence in a project to correct a perceived absence of voices, we must recognize that this in itself is a form of labor that also has political and emotional impact. In her recent article “Minor Threats,” Mimi Thi Nguyen relates a case where she was urged to add materials to the riot grrrl archive at the Fales Library at New York University which viewed the absence of materials created by or about women of color from the collection, as “a crisis, a decisive historical moment that demanded mediation.”16 She asks us to consider what might be lost or hidden in the process of “correction” of an absence and that correction is pursued. Without thought, without conversation, and without vulnerability on the part of those of you with good intentions, our process of correction can simultaneously introduce and spackle over its own violence. To Hell with good intentions, and to Hell with well-intentioned linked data.

None of this is easy, even from a technical perspective. There’s a William Gibson quote that futurists like to use: “the future is already here – it’s just not very evenly distributed.” This quote is in some senses true with the reality of linked data and the tools that support it. The technology is not new and relatively mature but few organizations have been able to wield it effectively. The thing is that I also think this quote is also bullshit. The past and present are already here too, and neither are evenly distributed. Owen Stephens, in a recent talk on enhancing library data with linked data, quotes G.L. Holbert’s assertion that “With the Internet, we each have our own printing press.” In response, I offer A.J. Liebling’s aphorism that “freedom of the press is guaranteed only to those who own one.” There are far too many tools and pieces of infrastructure that support publication and consumption linked data that are hard to set up. The possibility of using inference or questioning the provenance of linked data resources unfortunately remains somewhat unavoidable.

In his National Digital Forum talk, Tim Sherratt demands “simple tools” and “no platforms,” arguing that publication of linked data should be as easy as uploading an HTML page to a web server. While that is a noble goal, I nonetheless think that a certain level of additional of tool development is really necessary to help communities build out these narratives. There is a strong value for applications like Omeka and Mukurtu, in this space, and I am particularly interested to see how easy to use, open source, and visually welcoming authoring environments for interactive fiction like Twine can be leveraged or can inform how what direction these tools can take. We need to work more effectively together with our communities to produce these tools and to understand their impact.

I’d also like to add a word of warning here against being overly reliant on overcentralization here despite the gap in infrastructure. In the blog post I referenced earlier, Tressie McMillan Cottom responds to David Simon’s labeling as “fascism” of the actions of the students at Mizzou who removed the photographer from their encampment:

Fascism means something more than a thing one does not like. Fascism means a system of social organization that concentrates power and doesn’t just discourage dissent but organizes the State against it. … It is that hand-waving about a fascist state can confuse us about what making democracy looks like.3

Centralizing the process of naming in any context overly concentrates that power. I am not saying that all vendors are inherently bad, or that institutions responsible for metadata standards and authority files are bad. Yes, there’s a lot of work to be done to make linked data itself easier to publish, consume, and reuse for all kinds of institutions and communities, but I worry about our deference to this centralization. Despite my concerns about the lack of access to effective user-facing tools for linked data, I still believe its power is in its ability to leverage that decentralization. Relying on centralized authority management or metadata creation for everything, and the corporatization of library infrastructure, actively resists that decentralizing force, further limiting our own effectiveness in the construction of radical democracy.

I will close here to acknowledge some initiatives who have thought about this carefully and are taking action. It’s not to say that there haven’t been some missteps, but I urge you to read about their work. First, please read about the the People’s Archive of Police Violence in Cleveland, an online archive to collecting, preserving, and providing access to stories of police violence as experienced or observed by people living in Cleveland. It was organized as a collaboration between Cleveland residents and professional archivists across the United States in reaction to the epidemic of police violence over the last few years. Also, read about the Find & Connect site, developed to help Forgotten Australians and Former Child Migrants understand more about their past and about the historical context of child welfare. It provides contextual information not only on sites of “care,” but also on associated archival records and photos, and information about how to connect with support services including counseling, assistance in accessing archival records, and, when possible, reconnection with family. It also provides important context about how and what you might find on the site, and what might be confronting, disturbing, or otherwise upsetting. Finally, please read the papers as they are published from the annual conference organized by the Community Informatics Research Network. Unfortunately, this year’s conference was held earlier this week, and I found out far too late. All of these examples have given me a lot to think about in this space and I am looking to you to help determine what’s next.

So, think about what and how you name. Decentralize the ability to tell stories. Remember that the road to Hell is paved with good intentions. Shut up enough and step back far enough to listen, so you can make a lasting relationship to the people and communities you serve. Thank you.

Illich, Ivan. Address to the Conference on Inter-American Student Project, Cuernevaca, Mexico, April 1968. Published as “To hell with good intentions” in J.C. Kendall & Associates (eds.), Combining service and learning: A resource book for community and public service (Raleigh, NC: National Society for Internships and Experiential Education, 1990, 314-320). ↩