How would you design a collaboration community for scientists, given what we know about formal and informal scholarly communication in science; computer mediated communication; computer supported collaborative work; online communities; social software; and social studies of science?(this is a test mini essay in for comps prep. I wrote this offline so it does not have links to provide appropriate attribution or credit nor does it have complete citations)

1. IntroductionThere is an announcement for the next “facebook” for scientists almost every week. Frequently, these tools are just repurposed social software without any special design features specifically meant to support how scientists collaborate and communicate within collaborations. As Preece (2000) says, it is not a matter of “if you build it they will come”. Using ideas from the diffusion of innovations literature (Rogers, 2003; Ilie et al), it must be compatible with how scientists work, it must be visible, it must be trialable, there must be a perceived relative advantage over other similar tools, and since it is an interactive information technology, it will have to get to critical mass for wide-scale adoption. Based on what we know about how scientists communicate and how information technologies have changed how scientists communicate, we can suggest some guidelines for what a successful tool should do. The next few sections of this essay will describe these guidelines developed from the various streams of research.

2. Online communities.2.1. Design ProcessesFirst, the tool should be designed as an online community – a collaboration place that brings together people, with a common purpose, with policies, using information and communication technologies (Preece, 2000). In design, it is important to address both sociability as well as usability. To address sociability, there should be a clear stated purpose, with clear policies for membership and behavior, and moderators to encourage appropriate contributions and discourage inappropriate contributions. These policies help users trust the system and other users thereby encouraging contribution.

To address usability, navigation should be clear and easy to use, the site should be intuitive to use, there should be adequate help when needed, and the design should conform to best practices in web design. The site should also be machine-usable and interoperable; that is, it should import standard data formats (from RSS/XML to scientific data formats for chemicals or genes to bibliographic formats such as RIS and BibTeX), it should provide data streams in machine-consumable formats, and it should have a well-documented api to enable users to develop their own re-uses of the data.

Most importantly in the design process, this community cannot be designed in a vacuum and handed off to users complete as a Christmas present. It must be collaboratively constructed with lots of feedback from potential users and development should continue once the service is online to address feedback from actual users. At minimum, other sites should be evaluated using content analysis to determine what they do successfully and how scientists are using them; potential users should be interviewed to determine what needs the system could address, focus groups should be held to get feedback on prototypes, and usability testing should be done to check the web design choices that were made in the design process. Early adopters should be asked to trial the site in an alpha or beta test to run the software through regular use prior to wider release.

2.2 Interaction and MembershipBlanchard makes the distinction between virtual settlements and communities. She extends sociological studies of communities in the offline world to online communities (Huberman et al also address this). Communities provide support and a sense of belonging whereas virtual settlements may just be places people congregate online. Scientists have multiple memberships and social identities already as part of invisible colleges; as part of colleges, research groups, and labs; as editors, reviewers, authors, and readers of journals; as members of general purpose online communities (e.g., facebook, linked in, friendfeed, science blogs), and in their personal lives as friends, family, and so forth. Research and discussion with potential users is needed to determine if this tool should aim to be an online community in the Blanchard sense or merely a virtual settlement. It should not necessarily aim to replace anything, but to enable users to bring together some of this fragmentation.

In either case, it is clear from research by Lave and Wenger, as well as research done with open source software communities, and research done on “lurkers” by Nonnecke (sp?) and Preece, that the system should support various levels of participation. Lave and Wenger discuss legitimate peripheral participation. This is a way that new members can follow the activities of the community and learn how to participate while learning to become an active member and contributor. In other words, new users should be able to “lurk” to learn more about the community and then move into more central roles by first commenting on the work of others and then finally creating their own work, and forming their own subgroups.

Science is an international enterprise and a community should support widely distributed collaborations. This means different time zones, different cultures, different languages (although many participants in science will speak, read, and write English), and different expectations for social tools. This indicates that the system should focus on asynchronous tools that allow reflection, review, and can be revised. However, we know from studies by Olson et al, that distance does matter. Getting to common ground may be more difficult and may take longer with fewer cues (important article forgot the author – hope it will come back), particularly if the participants have not met in person at least once. Accordingly, this collaborative tool should offer support for linking to or embedding synchronous events such as meetings in Second Life or conference streaming as well as multimedia information such as YouTube videos or podcasts. In the case of blogs, trust is earned over time by establishing a personality through an archive of posts. This system can also provide histories for each person listing their contributions and memberships to enable other users to understand their point of view (see below in studies of scholarly communication and sts for discussion of attributes of the authors that should be shared).

3. Designing the System for ScientistsThe previous sections have applied general research on online communities and computer mediated communication to the problem at hand, designing a collaboration tool for scientists; however, we know a great deal about how scientists communicate, and this information is very important to the design of a successful system.

3.1 Data types and representing scientific knowledgeCommon research methods or materials can form boundary objects through which different groups of scientists can communicate (Fujimura). The issue at hand for this system is to represent these common objects such that users from different research areas can find them. For example, when searching an engineering digital library for Indium Tin Oxide, one would find many useful results by typing “ITO”. When searching a chemistry digital library this would have to be a linear formula (InSnO), and perhaps in Hill order (InOSn) (note: I’m not sure if any of these are 2 – this is for illustrative purposes). Likewise, mathematical or signal processing approaches may be shared by very diverse research groups, who do not read the same literature. A successful system would enable diverse users to collaborate around these boundary objects either on the same problem or just using the same method on different problems.

Some research areas in science seek to describe and model the physical world in terms of mathematical formulas. Typically, these formulas are created in LaTeX (a markup tool) or in a computational tool (like Matlab, Mathematica, etc) and then an image is generated and this image is uploaded to the web. The picture of the equation is not searchable or machine usable. Early adopters of blogs and wikis had to program their own plug-ins to be able to display equations in a usable format. Likewise, scientists represent materials using graphical chemical structures. More recently, a machine readable but chemically meaningful representation, inchi is being used, but its use is not entirely without controversy. This system must enable its users to represent scientific knowledge in the form of equations and chemical structures that are machine readable, but still fairly quick to input.

Borgman, Van House, and others describe the use of large collections of scientific data like GeneBank and virtual observatories in astronomy. In eScience, some of these repositories are so large that the calculations and manipulations must take place at the data, instead of downloading the data to the scientists’ machine. The collaborative tool also must support collaborative work around scientific data and information that are hosted in these large repositories. Linking out to these data might not be enough, the link should be semantic such that it indicates how the data are to be used.

Finally, the product of scientific work is often the peer-reviewed scientific articles. This is another form of data that is hosted externally, but around which collaborations can form. Community members should be able to refer to bibliographic data in a standard way and comment on scholarly articles. These comments should be made available to the journal publishers (as long as the commenter has agreed that her comments may be shared), so that they can display or use this information to provide context for the article. Likewise, a scientist who comments at the original article should be able to import his or her comment to this collaboration tool.

3.2 Attribution and CreditThere is continuing controversy about Mertonian Norms of Science and whether these norms are mythological or the lived experience of scientists. Likewise, there are many competing theories and explanations for why authors cite other work (see Nicolaisen’s review). In any case, attribution is still the currency of science (Polyani) and this cycle of credit is very important to science (Latour). Grant proposals, hiring, promotion, tenure, and lab space are all determined in part by what the scientist publishes, in which venues, and how well those publications are cited. The publication venues are judged in part by their impact factor, which is a measure of how frequently they are cited. Unfortunately, contributions to collaborations and to collaborative online communities are frequently not valued in evaluating scientists’ work. In the current system, therefore, this collaborative tool might be most useful in helping scientists complete their offline work and publish it as well as to find collaborators and establish collaborations to complete offline work and publish it.

In preparation for promotion, tenure, and grant systems that do value online work and to help in expertise location, work and contributions in this tool must be traceable to their contributor. In wikis, for example, edits are captured along with the time the change was made and the user name who made the edit. Contributions should be retrievable both at the place the information is stored, and at the contributor’s profile. Additionally, contributions could be rated by other users as to how useful they were. Authors who make a lot of valuable contributions might have some special icon in their profile or signature.

3.3 Member ProfilesWe know from studies of document selection and relevance (see for example Wang and Soergel) that scientists judge the relevance of articles using information about the author, his or her advisor, and his or her affiliation. At the same time, as discussed above, members who are new to the system may want to be peripheral participants and members whose contributions will not be valued by their home institutions (or may be used against them as “a waste of time”) may want to use a pseudonym or be anonymous (see discussions of women scientist bloggers). Member profiles should allow links to professional home pages, blogs, profiles on other networks, a listing of articles written, a picture or avatar, and semantic links to affiliations – but all of these must be optional. If the member has an existing persona used on his or her blog, then this can be used in the member profile.

4. Fitting into the existing information ecosystemThis essay has touched on various ways that this tool should fit into the existing ecosystem, but it is valuable to compile these thoughts and to close the essay with a discussion of compatibility with the existing systems. First, scientists have various workflows for identifying, retrieving, keeping, using, and refinding information used as inputs to their scholarly work. Bloggers have mentioned that the refinding process is simplified when their notes are kept on a blog instead of in individual files on their desktop or in lab notebooks. Likewise, personal information management tools such as bibliographic managers are helpful when reusing references to published information. It is not suggested that this one tool replace all of these existing tools, rather that it can take data streams produced by a wide variety of narrow-use tools, and compile them in one place so that they can be searched, shared, annotated, and reused more easily. Friendfeed does this, to a certain extent, but this system could be built to understand data streams used in science. The system could replace some tools that do not work well for science such as blogs that do not adequately support equations and scientific symbols.

Borgman and others note that finding and reusing data is very complicated. Whereas there is a very well-developed system to support archiving, organizing, and providing access to scholarly publications, we really do not have a similar system for data. This tool certainly cannot address digital preservation issues or management of large data sets, but it can enhance access by enabling users to link to and collaboratively work with information pulled from these data sets. By providing semantic links out to data stored in disciplinary repositories, the system can support and enhance the data’s reuse, findability, and value.

Some collaborations and collaborative work needs to be done in private or in closed spaces and only shared when it is complete. This system should allow groups of members to create new private spaces where they can work together, with the support of the larger system, but without sharing their work until they are ready. Work done on the system should have permanent URLs which, at the option of the collaborators, can be assigned digital object identifiers for use in citations from the scholarly literature.

As discussed by Nielsen and by Gowers in his post on massively collaborative math, there needs to be a way to advertise for help/collaborators/expertise wanted to likely audiences whether or not they are part of the system. The example that Nielsen gives is if, in the middle of a proof, a mathematician has a sticking point that will take a couple of weeks to get around because it requires additional background reading, but which another mathematician might know right off. The system should allow users to describe these points to the larger group and solicit help. Moderators could help the users describe their problem so that members in different research areas can find it to respond.

Finally, this entire discussion has been about a community for practicing scientists, but I found in my overview of engineering communities that fledgling communities could be scuttled by being inundated by college students trying to get homework help (or to get their homework done for them). Likewise, there seems to be a lot of interest in the science blogosphere in supporting science classrooms and public understanding of science. Separate areas could be created at the community to “ask a scientist”, “get homework help”, “find a scientist speaker.” Any posts or contributions with these aims could be moved to these areas by the moderators and so that interested scientists can respond, but scientists who are collaborating on scientific work are not interrupted.

That's a great rant, and the more active discussion is probably on friendfeed, but I'll comment here.

I agree that "facebook for scientists" or "LinkedIn for scientists" isn't going to work, but we're just starting to realize why. I think the main factor is that services launched with that concept in mind are thinking "if you build it , they will come" and it doesn't appear that that's true.

There are two categories of successful social sites: Those that are popular just because they were first(Myspace/Facebook/LinkedIn), and those that are successful because they add value to content you already have, even if you don't use the social features at all. Examples of these would be Flickr or Youtube. Only one service gets to be first, so it's not going to work out to well to copy those that owe most of their initial success to being first. Services that brand themselves as "myspace/facebook for scientists" are essentially saying "I don't get what scientists need".

I like Flickr an an example because I can upload my photos and, without having to have a network of contacts, I can do more with them than if they just sat on my harddrive. I can email the link to friends, embed them on my blog, they can be commented on and shared, and so on. Tagging and putting them on a map makes them show up in searches, so that's cool, but not necessary. I do it because it adds value.

Even without adding any content, Flickr and Youtube are useful, because they do such a good job of discovery. I'm always seeing interesting pictures from Flickr and funny or educational videos from Youtube posted on blogs or in forums and that's how their discovery works.

I wrote a longish post about recommender systems being the killer app for social media, comparing the two approaches of last.fm and Pandora. Back then, I was thinking in terms of recommendations coming from the content on the site, like Amazon's related items, Flickr's "interestingness", but now I think that in contrast to a purely algorithmically-derived(though userdata based) recommendation, recommendation from a valued source is more important. I never go to youtube.com and start looking for interesting stuff. I exclusively access youtube through links found elsewhere, and the same is true for flickr and even slideshare.

I've been working with a service called Mendeley to enable this kind of discovery for the scientific literature. I would love to see last.fm-style recommendations for papers I should probably read and I would love to see suggestions from my trusted sources that paper X is worth reading or giving a dissection of paper Y. The argument is often made that scientists are to busy to spend time doing that stuff, and it's a good argument. That's why harnessing the same dynamics that lead to Flickr pictures being re-shared and youtube videos being embedded makes so much sense.

Hmmm....guess I should have just written the blog post instead. Great rant, Christina. Do you think my point about how to make a successful collaboration community(up-front value) is a good one?

Do you think the community would need "personal value precedes network value"? For instance, you mention bloggers "refinding" their own writing.

Regarding digital preservation--you're a little softer on it than I would be. You say: "This tool certainly cannot address digital preservation issues or management of large data sets, but it can enhance access by enabling users to link to and collaboratively work with information pulled from these data sets." I can't tell whether you're letting it off the hook of digital preservation on the whole (bah!) or just of large datasets (reasonable, IMO).

I think digital preservation will be an important component of a collaboration community for science. I could envision a lifestream-like open notebook for science, which ingested large quantities of material. There would be various privacy-level settings ('just me', 'collaborators', 'this particular group', 'world-viewable') which would be set by default based on source (and overrideable). External materials would be cached (preserving the conversation, and available in case of data-loss or host-expiration elsewhere).

I think materials would need to be viewable as a timestream as well as by project.

It's interesting to think about and I'm very much looking forward to hearing how your research proceeds!

A very good rant Christina! The part about how mathematical symbols and equations have yet to be consistently rendered is a big sticking point....for sure. If enough of the scientific community gives their input to this situation then maybe some headway can be gained. Thanks.

A very good rant Christina! The part about how mathematical symbols and equations have yet to be consistently rendered is a big sticking point....for sure. If enough of the scientific community gives their input to this situation then maybe some headway can be gained. Thanks! york region homes

A very good rant Christina! The part about how mathematical symbols and equations have yet to be consistently rendered is a big sticking point....for sure. If enough of the scientific community gives their input to this situation then maybe some headway can be gained. Thanks! york region homes

This is my blog on library and information science. I'm into Sci/Tech libraries, special libraries, personal information management, sci/tech scholarly comms.... My name is Christina Pikas and I'm a librarian in a physics, astronomy, math, computer science, and engineering library. I'm also a doctoral student at Maryland. Any opinions expressed here are strictly my own and do not necessarily reflect those of my employer or CLIS. You may reach me via e-mail at cpikas {at} gmail {dot} com.