We propose a set of methodological principles and strategies for the use of trace data, i.e., data capturing performances carried out on or via information systems, often at a fine level of detail. Trace data comes with a number of methodological and theoretical challenges associated with the inseparable nature of the social and material. Drawing on Haraway and Barad’s distinctions among refraction, reflection and diffraction, we compare three approaches to trace data analysis. We argue that a diffractive methodology allows us to explore how trace data are not given but created though construction of a research apparatus to study trace data. By focusing on the diffractive ways in which traces ripple through an apparatus, it is possible to explore some of the taken-for-granted, invisible dynamics of sociomateriality. Equally, important this approach allows us to describe what and when distinctions within entwined phenomena emerge in the research process. Empirically, we illustrate the guiding principles and strategies by analyzing trace data from Gravity Spy, a crowdsourced citizen science project on Zooniverse. We conclude by suggesting that a diffractive methodology may help us draw together quantitative and qualitative research practices in new and productive ways that also raises interesting design questions.

We present the design of a citizen science system that uses machine learning to guide the presentation of image classification tasks to newcomers to help them more quickly learn how to do the task while still contributing to the work of the project. A Bayesian model for tracking volunteer learning for training with tasks with uncertain outcomes is presented and fit to data from 12,986 volunteer contributors. The model can be used both to estimate the ability of volunteers and to decide the classification of an image. A simulation of the model applied to volunteer promotion and image retirement suggests that the model requires fewer classifications than the current system.

Open collaboration platforms involve people in many tasks, from editing articles to analyzing datasets. To facilitate mastery of these practices, communities offer a number of learning resources, ranging from project-defined FAQs to individually-oriented search tools and communal discussion boards. However, it is not clear which project resources best support participant learning, overall and at different stages of engagement with the project. We draw on Sørensen’s framework of forms of presence to distinguish three forms of engagement with learning resources: authoritative, agent-centered and communal. We analyzed trace data from the GravitySpy citizen-science project using a mixed-effects logistic regression with volunteer performance as an outcome variable. The findings suggest that engagement with authoritative resources (e.g., those constructed by project organizers) facilitates performance initially. However, as tasks become more difficult, volunteers seek and benefit from engagement with their own agent-centered resources and community generated resources. These findings suggest a broader scope for the design of learning resources for online communities.

The observation of gravitational waves from compact binary coalescences by LIGO and Virgo has begun a new era in astronomy. A critical challenge in making detections is determining whether loud transient features in the data are caused by gravitational waves or by instrumental or environmental sources. The citizen-science project Gravity Spy has been demonstrated as an efficient infrastructure for classifying known types of noise transients (glitches) through a combination of data analysis performed by both citizen volunteers and machine learning. We present the next iteration of this project, using similarity indices to empower citizen scientists to create large data sets of unknown transients, which can then be used to facilitate supervised machine-learning characterization. This new evolution aims to alleviate a persistent challenge that plagues both citizen-science and instrumental detector work: the ability to build large samples of relatively rare events. Using two families of transient noise that appeared unexpectedly during LIGO's second observing run, we demonstrate the impact that the similarity indices could have had on finding these new glitch types in the Gravity Spy program.

For peer-production projects to be successful, members must develop a specific and common language that enables them to cooperate. We address the question of what factors affect the development of shared language in open peer production communities? Answering this question is important because we want the communities to be productive even when self-managed, which requires understanding how shared language emerges. We examine this question using a structurational lens in the setting of a citizen science project. Examining the use of words in the Gravity Spy citizen science project, we find that many words are reused and that most novel words that are introduced are not picked up, showing reproduction of structure. However, some novel words are used by others, showing an evolution of the structure. Participants with roles closer to the science are more likely to have their words reused, showing the mutually reinforcing nature of structures of signification, legitimation and domination.

Researchers studying user behaviors in online communities often conduct analyses of events collected in system logs, e.g., a system’s record of a comment post or of a contribution. However, analysis of user behaviors is more difficult if users make contributions without being logged-in (i.e., anonymously). Since a user’s account will not be associated with contributions that user makes anonymously, conclusions about user behaviors that look only at attributed actions might not account for a user’s full experience. To understand the impacts of anonymous contributions on research, we conducted an analysis of system logs containing anonymous activities in two online citizen science projects. By linking anonymous events with user IDs we found that (1) many users contribute anonymously, though with varied patterns of contribution; and (2) including anonymous activities alter conclusions made about users’ experience with the project. These results suggest that researchers of human behaviors in online communities should consider the possible impacts of anonymous interaction on their ability to draw conclusions about user behaviors in these settings.

Members of highly-distributed groups in online production communities face challenges in achieving coordinated action. Existing CSCW research highlights the importance of shared language and artifacts when coordinating actions in such settings. To better understand how such shared language and artifacts are, not only a guide for, but also a result of collaborative work we examine the development of folksonomies (i.e., volunteer-generated classification schemes) to support coordinated action. Drawing on structuration theory, we conceptualize a folksonomy as an interpretive schema forming a structure of signification. Our study is set in the context of an online citizen-science project, Gravity Spy, in which volunteers label "glitches" (noise events recorded by a scientific instrument) to identify and name novel classes of glitches. Through a multi-method study combining virtual and trace ethnography, we analyze folksonomies and the work of labelling as mutually constitutive, giving folksonomies a dual role: an emergent folksonomy supports the volunteers in labelling images at the same time that the individual work of labelling images supports the development of a folksonomy. However, our analysis suggests that the lack of supporting norms and authoritative resources (structures of legitimation and domination) undermines the power of the folksonomy and so the ability of volunteers to coordinate their decisions about naming novel glitch classes. These results have implications design. If we hope to support the development of emergent folksonomies online production communities need to facilitate 1) tag gardening, a process of consolidating overlapping terms of artifacts; 2) demarcate a clear home for discourses around folksonomy disagreements; 3) highlight clearly when decisions have been reached; and 4) inform others about those decisions.

In this paper, we describe the results of an online field experiment examining the impacts of messaging about task novelty on the volume of volunteers’ contributions to an online citizen science project. Encouraging volunteers to provide a little more content as they work is an attractive strategy to increase the community’s output. Prior research found that an important motivation for participation in online citizen science is the wonder of being the first person to observe a particular image. To appeal to this motivation, a pop-up message was added to an online citizen science project that alerted volunteers when they were the first to annotate a particular image. Our analysis reveals that new volunteers who saw these messages increased the volume of annotations they contributed. The results of our study suggest an additional strategy to increase the amount of work volunteers contribute to online communities and citizen science projects specifically.

Research on newcomer roles in peer production sites (e.g., Wikipedia) is characterized by a broad and relatively well-articulated set of functionally and culturally recognizable roles. But not all communities come with well-defined roles that newcomers can aspire to occupy. The present study explores activity clusters newcomers create when faced with few recognizable roles to fill and limited access to other participants’ work that serves as an exemplar. Drawing on a mixed method research design, we present findings from an analysis of 1,687 newcomers’ sessions in a citizen science project. Combining session- and individual-level analysis produced three findings (1) newcomers activities manifest a diverse range of session types; (2) Newcomers toggle between light work sessions and more involved types of production or community engagement; (3) an interesting relationship between high-level contributors who do a lot of work but little talk and a small group that does a lot of talk but less work. The former group draws heavily on posts contributed by the latter group. Identifying shifts and regularities in contribution facilitate improved mechanisms for engaging participants and the design of online citizen science communities.

The paper explores the motivations of volunteers in a large crowd sourcing project and contributes to our understanding of the motivational factors that lead to deeper engagement beyond initial participation. Drawing on the theory of legitimate peripheral participation (LPP) and the literature on motivation in crowd sourcing, we analyze interview and trace data from a large citizen science project. The analyses identify ways in which the technical features of the projects may serve as motivational factors leading participants towards sustained participation. The results suggest volunteers first engage in activities to support knowledge acquisition and later share knowledge with other volunteers and finally increase participation in Talk through a punctuated process of role discovery.

Making the traces of user participation in primary activities visible in online crowdsourced initiatives has been shown to help new users understand the norms of participation but participants do not always have access to others’ work. Through a combination of virtual and trace ethnography we explore how new users in two online citizen science projects engage other traces of activity as a way of compensating. Merging the theory of legitimate peripheral participation with Erickson and Kellogg’s theory of social translucence we introduce the concept of practice proxies; traces of user activities in online environment that act as resources to orient newcomers towards the norms of practice. Our findings suggest that newcomers seek out practice proxies in the social features of the projects that highlight contextualized and specific characteristics of primary work practice.