In their ‘Critical Questions for Big Data’, danah boyd and Kate Crawford warn: ‘Taken out of context, Big Data loses its meaning’. In this short commentary, I contextualize this claim about context. The idea that context is crucial to meaning is shared across a wide range of disciplines, including the field of ‘context-aware’ recommender systems. These personalization systems attempt to take a user’s context into account in order to make better, more useful, more meaningful recommendations. How are we to square boyd and Crawford’s warning with the growth of big data applications that are centrally concerned with something they call ‘context’? I suggest that the importance of context is uncontroversial; the controversy lies in determining what context is. Drawing on the work of cultural and linguistic anthropologists, I argue that context is constructed by the methods used to apprehend it. For the developers of ‘context-aware’ recommender systems, context is typically operationalized as a set of sensor readings associated with a user’s activity. For critics like boyd and Crawford, context is that unquantified remainder that haunts mathematical models, making numbers that appear to be identical actually different from each other. These understandings of context seem to be incompatible, and their variability points to the importance of identifying and studying ‘context cultures’–ways of producing context that vary in goals and techniques, but which agree that context is key to data’s significance. To do otherwise would be to take these contextualizations out of context.

We all have preferences for how we work. Maybe you’re the kind of person who likes to work in complete isolation, in which case this blog post is not for you. But if you’re like me, there’s something appealing about being deeply engaged in your own work in proximity to people who are also being productive. This is why I have long struggled to work at home and instead tend to write in coffee shops and libraries. I’ve also experimented with more intentional forms of co-working. For many years, my most successful attempt was with my friend Stephen. As a DJ, Stephen would work on mixes and set lists, while I would typically revise papers – beyond the fact that we’ve been friends for years and enjoy hanging out, I think we both got a lot out of the gentle pressure/quite support of collocated work. In the last few years, I’ve made several other efforts at co-working, spanning in-person, online and inter-species collaborations (#noclickbait – it’s not as exciting as it sounds), which I thought I’d share below. If you have other ideas for coworking, feel free to share them in the comments!

Presentation by intern Nathan Matias on the project he worked on during the summer at the SMC. He has continued to work on his research, so in case you have not read it here is a more updated post on his work:

We are happy to share SMC’s intern Aleena Chia’s presentation of her summer project titled “Co-creation and Algorithmic Self-Determination: A study of player feedback on game analytics in EVE Online”.

Aleena’s project summary and the videos of her presentation below:

Digital games are always already information systems designed to respond to players’ inputs with meaningful feedback (Salen and Zimmerman 2004). These feedback loops constitute a form of algorithmic surveillance that have been repurposed by online game companies to gather information about player behavior for consumer research (O’Donnell 2014). Research on player behavior gathered from game clients constitutes a branch of consumer research known as game analytics (Seif et al 2013).[1] In conjunction with established channels of customer feedback such as player forums, surveys, polls, and focus groups, game analytics informs companies’ adjustments and augmentations to their games (Kline et al 2005). EVE Online is a Massively Multiplayer Online Game (MMOG) that uses these research methods in a distinct configuration. The game’s developers assemble a democratically elected council of players tasked with the filtration of player interests from forums to inform their (1) agenda setting and (2) contextualization of game analytics in the planning and implementation of adjustments and augmentations.

This study investigates the council’s agenda setting and contextualization functions as a form of co-creation that draws players into processes of game development, as interlocutors in consumer research. This contrasts with forms of co-creation that emphasize consumers’ contributions to the production and circulation of media content and experiences (Banks 2013). By qualitatively analyzing meeting minutes between EVE Online’s player council and developers over seven years, this study suggests that co-creative consumer research draws from imaginaries of player governance caught between the twin desires of corporate efficiency and democratic efficacy. These desires are darned together through a quantitative public sphere (Peters 2001) that is enabled and eclipsed by game analytics. In other words, algorithmic techniques facilitate collective self-knowledge that players seek for co-creative deliberation; these same techniques also short circuit deliberation through claims of neutrality, immediacy, and efficiency.

The significance of this study lies in its analysis of a consumer public’s (Arvidsson 2013) ambivalent struggle for algorithmic self-determination – the determination by users through deliberative means of how their aggregated acts should be translated by algorithms into collective will. This is not primarily a struggle of consumers against corporations; nor of political principles against capitalist imperatives; nor of aggregated numbers against individual voices. It is a struggle within communicative democracy for efficiency and efficacy (Anderson 2011). It is also a struggle for communicative democracy within corporate enclosures. These struggles grind on productive contradictions that fuel the co-creative enterprise. However, while the founding vision of co-creation gestured towards a win-win state, this analysis concludes that algorithmic self-determination prioritizes efficacy over efficiency, process over product. These commitments are best served by media companies oriented towards user retention rather than recruitment, business sustainability rather than growth, and that are flexible enough to slow down their co-creative processes.

[1] Seif et al (2013) maintain that player behavior data is an important component of game analytics, which includes the statistical analysis, predictive modeling, optimization, and forecasting of all forms of data for decision making in game development. Other data include revenue, technical performance, and organizational process metrics.

UPDATE, Sept 16, 9pm ET: Redditors brilliantly spotted an important gap in my dataset and worked with me to resolve it. After taking the post down for two days, I am posting the corrected results. Thanks to their quick work, the graphics and findings in this post are more robust.

Academic research on the work of moderators would expect that the most important predictor of blackout participation would be the workload, which creates common needs across subs. Aaron Shaw and Benjamin Mako Hill argue, based on evidence from Wikia, that as the work of moderating becomes more complex within a community, moderators grow in their own sense of common identity and common needs as distinct from their community (read Shaw and Hill’s Wikia paper here). Postigo argues something similar in terms of moderators’ relationship to a platform: when moderators feel like they’re doing huge amounts of work for a company that’s not treating them well, they can develop common interests and push back (read my summary of Postigo’s AOL paper here).

Testing Redditors’ Explanations of The Blackout

After posting an initial data analysis to reddit three weeks ago, dozens of moderators generously contacted me with comments and offers to let me interview them. In this post, I test hypotheses straight from redditors’ explanations of what led different subreddits to join the blackout. By putting all of these hypotheses into one model, we can see how important they were across reddit, beyond any single sub. (see my previous post) (learn more about my research ethics and my promises to redditors)

TLDR:

Subs who shared mods with other blackout subs were more likely to join the blackout, but controlling for that:

Default subs were more likely to join the blackout

NSFW subs were more likely to join the blackout

Subs with more moderators were slightly more likely to join the blackout

More active subs were more likely to join the blackout

More isolated subs were less likely to join the blackout

Subs whose mods participate in metareddits were more likely to join the blackout

Subs whose mods get and give help in moderator-specific subs were no more or less likely to join the blackout

In my research I have read over a thousand reddit threads, interviewed over a dozen moderators, archived discussions in hundreds of subreddits, and collected data from the reddit API— starting before the blackout. Special thanks to everyone who has spoken with me and shared data.

I account for changes in subreddit leadership (with some gaps for subreddits that have experienced substantial leadership changes since July) In this dataset, half of the 10 most active subs joined the blackout, 24% of the 100 most active, 14.2% of the 1,000 most active, and 4.7% of the 20,000 most active subreddits.

To illustrate the data, here are two charts of the top 52,754 most active subreddits as they would have stood at the end of June. The font size and node size are related to the log-transformed number of comments from June. Ties between subreddits represent shared moderators. The charts are laid out using the ForceAtlas2 layout on Gephi, which has separated out some of the more prominent subreddit networks, including the ImaginaryNetwork, the “SFW Porn” Network, and several NSFW networks (I’ve circled notable networks in the network graph at the top of this post).

Redditors’ Explanations Of Blackout Participation

With 2,278 subreddits joining the blackout, redditors have many theories for what experiences and factors led subs to join the blackout. In the following section, I share these theories and then test one big logistic regression model that accounts for all of the theories together. In these tests, I consider 52,745 subreddits that had at least one comment in June 2015. A total of 1,342 of these subreddits joined the blackout.

The idea of blacking out had come up before. According to one moderator, blacking out was first discussed by moderators three years ago as a way to protest Gawker’s choice to publish details unmasking a reddit moderator. Although some subs banned Gawker URLs from being posted to their communities, the blackout didn’t take off. While some individual subreddits have blacked out in the intervening years, this was the first time that many subs joined together.

I tested these hypotheses with the set of (firth) logistic regression models shown below. The final model (on the right) offers the best fit of all the models, with a McFadden R2 of 0.123, which is pretty good.

The network of moderators who moderate blackout subs is the strongest predictor in this model. At a basic level, it makes sense that moderators who participated in the blackout in one subreddit might participate in another. Making sense of this kind of network relationship is a hard problem in network science, and this model doesn’t include time as a dimension, so we don’t consider which subs went dark before which others. If I had data on the time that subreddits went dark, it might be possible to better research this interesting question, like Bogdan State and Lada Adamic did with their paper on the Facebook equality meme.

Hypothesis 1: Default subs were more likely to join the blackout

In interviews, some moderators pointed out that “most of the conversation about the blackout first took place in the default mod irc channel.” Moderators of top subs described the blackout as mostly an issue concerning default or top subreddits.

This hypothesis supported in the final model. For example, while a non-default subreddit with 4 million monthly comments had a 32.9% chance of joining the blackout (holding all else at their means), a default subreddit of the same size had a 48.6% chance of joining the blackout, on average in the population of subs.

Hypothesis 2: Subs with more comment activity were more likely to join the blackout

Moderators of large, non-default subreddits also had plenty of reasons to join the blackout, either because they also shared the need for better moderating tools, or because they had more common contact and sympathy with other moderators as a group.

Even among subreddits that declined to joint the blackout, many moderators described feeling obligated to make a decision one way or an other. This surprised moderators of large subreddits, who saw it as an issue for larger groups. Size was a key issue in the hundreds of smaller groups that discussed the possibility, with many wondering if they had much in common with larger subs, or whether blacking out their smaller sub would make any kind of dent in reddit’s advertising revenue.

In the final model, larger subs were more likely to join the blackout, a logarithmic relationship that is mediated by the number of moderators. When we set everything else to its mean, we can observe how this looks for subs of different sizes. In the 50th percentile, subreddits with 6 comments per month had a 1.6% chance of joining the blackout — a number that adds up with so many small subs. In the 75th percentile, subs with 46 comments a month had a 2.5% chance of joining the blackout. Subs with 1,000 comments a month had a 5.4% chance of joining, while subs with 100,000 comments a month had a 15.8% chance of joining the blackout, on average in the population of subs, holding all else constan.

Hypothesis 3: NSFW subs were more likely to join the blackout

In interviews, some moderators said that they declined to join the blackout because they saw it as something associated with support for hate speech subreddits taken down by the company in June or other parts of reddit they preferred not to be associated with. Default moderators denied this flatly, describing the lengths they went to dissociate from hate speech communities and sentiment against then-CEO Ellen Pao. Nevertheless, many journalists drew this connection, and moderators were worried that they might become associated with those subs despite their efforts.

Another possibility is that NSFW subs have to do more work to maintain subs that offer high quality NSFW conversations without crossing lines set by reddit and the law. Perhaps NSFW subs just have more work, so they were more likely to see the need for better tools and support from reddit.

In the final model, NSFW subs were more likely to join the blackout than non-NSFW subs. For example, while a non-default, non-NSFW subreddit with 22,800 of comments had a 11.4% chance of joining the blackout (holding all else at their means), an NSFW subreddit of the same size had a 15.3% chance of joining the blackout, on average in the population of subs. Among less popular subs, a non-NSFW sub with 1,000 comments per month had a 5.4% chance of joining the blackout, while an NSFW sub of the same size had a 7.5% chance of joining, holding all else constant, on average in the population of subs.

Hypothesis 4: More isolated subs were less likely to join the blackout

In the interviews I conducted, as well as the 90 or so interviews I read on /r/subredditoftheday, moderators often contrasted their communities with “the rest of reddit.” When I asked one moderator of a support-oriented subreddit about the blackout, they mentioned that “a lot of the users didn’t really identify with the rest of reddit.” Subscribers voted against the blackout, describing it as “a movement we didn’t identify with,” this moderator said.

To test hypotheses about more isolated subs, I parsed all comments in every public subreddit in June 2015, generating an “in/out” ratio. This ratio consists of the total comments within the sub divided by the total comments made elsewhere by the sub’s commenters. A subreddit whose users stayed in one sub would have a ratio above 1, while a subreddit whose users commented widely would have a ratio below 1. I tested other measures, such as the average of per-user in/out ratios, but the overall in/out ratio seems the best.

In the final model, more isolated subs were less likely to join the blackout, on a logarithmic scale. Most subreddit’s commenters participate actively elsewhere on reddit, at a mean in/out ratio of 0.24. That means that on average, a subreddit’s participants make 4 times more comments outside a sub than within it. At this level, holding everything else at their means, a subreddit with 1,000 comments a month had a 4.0% chance of joining the blackout. A similarly-sized subreddit whose users made half of their comments within the sub (in/out ratio of 1.0) had just a 1% chance of joining the blackout. Very isolated subs whose users commented twice as much in-sub had a 0.3% chance of joining the blackout, on average in the population of subs, holding all else constant.

Hypothesis 5: Subs with more moderators were more likely to join the blackout

This one was my hypothesis, based on a variety of interview details. Subs with more moderators tend to have more complex arrangements for moderating and tend to encounter limitations in mod tools. Sums with more mods also have more people around, so their chances of spotting the blackout in time to participate was also probably higher. On the other hand, subs with more activity tend to have more moderators, so it’s important to control for the relationship between mod count and sub activity.

I was wrong. In the final model, subs with more moderators were LESS likely to join the blackout. There is a very small relationship here, and the relationship is mediated by the number of comments. For a sub with 1000 comments per month, with everything else at its average, a subreddit with 3 moderators (the average) had 5.4% chance of joining the blackout. A subreddit with 8 moderators had a 6% chance of joining the blackout, on average in the population of subs.

Hypothesis 6: Subs with admins as mods were more (or less) likely to join the blackout

I heard several theories about admins. During the blackout, some redditors claimed that admins were preventing subs from going private. In interviews, moderators tended to voice the opposite opinion. They argued that subs with admin contact were joining the blackout in order to send a message to the company, urging it to pay more attention to employees who advocated for moderator interests. Moderators at smaller subs said, “we felt 100% independent from admin assistance so it really wasn’t our fight.”

None of my hypothesis tests showed any statistically significant relationship between current or past admin roles as moderators and participation in the blackout, either way. For that reason, I omit it from my final model.

Hypothesis 7: Subs with moderators who moderated other subs were more likely to join the blackout

I’ve been wondering if moderators with multiple mod roles elsewhere on reddit would be more likely to join the blackout, perhaps because they had greater “solidarity” with other subreddits, or because they were more likely to find out about the blackout.

In the final model, the reverse is supported. Subs that shared moderators with other subs were actually less likely to join the blackout, a relationship that is mediated by the by the number of moderators who also modded blackout subs. Holding blackout sub participation constant, a sub of 1,000 comments per month and 3 moderator roles shared with other subs had a 5.7% chance of joining the blackout, while a more connected sub with 6 shared moderator roles (in the 4th quantile) had a 4.2% chance of joining the blackout, on average in the population of subs, holding all else constant.

Hypothesis 8: Subreddits with mods who also moderate other blackout subs were more likely to join the blackout.

In the final model, subreddits with mods with roles in other blackout subs were more likely to join the blackout, a relationship on a log scale that is mediated by the number of moderator roles shared with other subs more generally. 19% of subs in the sample share at least one moderator with a blackout sub, after removing moderator bots. A sub with 1,000 comments per month that didn’t have any overlapping moderators with blackout subs had a 3.2% chance of joining the blackout, while a sub with one overlapping moderator had an 11.1% chance to join, and a sub with 2 overlapping moderators had a 21.1% chance to join. For a sub with 6 overlapping moderators with blackout subs, a sub had a 57.2% chance of joining the blackout.

I tend to see the network of co-moderation as a control variable. We can expect that moderators who joined the blackout would be likely to support it across the many subs they moderate. By accounting for this in the model, we get a clearer picture on the other factors that were important.

Hypothesis 9: Subs with moderators who participate in metareddits were more likely to join the blackout

In interviews, several moderators described learning about the blackout from “meta-reddits” which cover major events on the site, and which mostly stayed up during the blackout. Just like we might expect more isolated subs to stay out of the blackout, we might expect moderators who get involved in reddit-wide meta-discussion to join the blackout. I took my list of metareddits from this TheoryOfReddit wiki post.

In the final model, subs with moderators who participate in metareddits were more likely to join the blackout, on a logarithmic scale. Most moderators on the site do not participate in metareddits. A sub of 1,000 comments per month with no metareddit participation by its moderators had a 5.3% chance of joining the blackout, while a similar sub whose moderators made 5 comments on any metareddit per month had a 6.3% chance of joining the blackout.

Hypothesis 10: Subs with mods participating in moderator-focused subs were more likely to join the blackout

Although key moderator subs like /r/defaultmods and /r/modtalk are private and inaccessible to me, I could still test a “solidarity” theory. Perhaps moderators who participate in mod-specific subs, who have helped and been helped by other mods, would be more likely to join the blackout?

Although this predictor is significant in a single-covariate model, when you account for all of the other factors, mod participation in moderator-focused subs is not a significant predictor of participation in the blackout.

This surprises me. I wonder: since moderator-specific subs tend to have low volume, one month of comments may just not be enough to get a good sense of which moderators participate in those subs. Also, this dataset doesn’t include IRC discussions (nor will it ever), where moderators seem mostly to hang out with and help each other. But from the evidence I have, it looks like help from moderator-focused subs played no part to sway moderators to join the blackout.

So, how DID solidarity develop in the blackout?

The question is still open, but from these statistical models, it seems clear that factors beyond moderator workload had a big role to play, even when controlling for mods of multiple subs that joined the blackout.

In further analysis in the next week, I’m hoping to include:

Activity by mods in each sub (comments, deletions)

Comment karma, as another measure of activity (still making sense of the numbers to see if they are useful here)

The complexity of the subreddit, as measured by things in the sidebar (possibly)

Building Statistical Models of Online Behavior through Qualitative Research

The process of collaborating with redditors on my statistical models has been wonderful. As I continue this work, I’m starting to think more and more about the idea of participatory hypothesis testing, in parallel with work we do at MIT around a Freire-inflected practices of “popular data“. If you’ve seen other examples of this kind of thing, do send them my way!

Wired recently selected its 21 “must-follow” feeds in the world of business, and the Social Media Collective blog was among them! See the entire list here. We’re thrilled, as so much of our goal is to span both scholarly and industry conversations around social media and its critical cultural implications. Stay tuned for more from this blog in the coming months.

About Us

The Social Media Collective (SMC) is a network of social science and humanistic researchers, part of the Microsoft Research labs in New England and New York. It includes full-time researchers, postdocs, interns, and visitors. Beginning in 2009, the researchers who now lead the initiative are: Nancy Baym, danah boyd, Kate Crawford, Tarleton Gillespie, and Mary Gray. Our primary purpose is to provide rich contextual understanding of the social and cultural dynamics that underpin social media technologies. We use a variety of methodologies and span multiple disciplines.

Subscribe by Email!

Enter your email address to subscribe to the SMC blog. You'll get an email when we update the blog.