Social Bookmarking Tools (II)

A Case Study - Connotea

Introduction

Connotea [1] is a free online reference management and social bookmarking service for scientists created by Nature Publishing Group [2]. While somewhat experimental in nature, Connotea already has a large and growing number of users, and is a real, fully functioning service [3]. The label 'experimental' is not meant to imply that the service is any way ephemeral or esoteric, rather that the concept of social bookmarking itself and the application of that concept to reference management are both recent developments. Connotea is under active development, and we are still in the process of discovering how people will use it. In addition to Connotea being a free and public service, the core code is freely available under an open source license [4].

Connotea was conceived from the outset as an online, social tool. Seeing the possibilities that del.icio.us [5] was opening up for its users in the area of general web linking, we realised that scholarly reference management was a similar problem space. Connotea was designed and developed late in 2004, and soft-launched at the end of December 2004.

Usage has grown over the past several months, to the point where there is now enough data in the system for interesting second-order effects to emerge. This paper will start by giving an overview of Connotea, and will outline the key concepts and describe its main features. We will then take the reader on a brief guided tour, show some of the aforementioned second-order effects, and end with a discussion of Connotea's likely future direction.

The Key Concepts behind Connotea

Unashamedly inspired by del.icio.us as we were, we felt there were some additional characteristics of academic reference lists that warranted a separate, tailored service catering to, and making the most of, the needs of the academic community. Therefore, Connotea is a meld of existing reference management conventions and new social bookmarking concepts that are explored in a companion paper [6]. While none of these concepts is original to Connotea, the combination is. To spell them out:

1. Online storage of references and bookmarks

The current standard practice in personal reference management is to have a database of citations stored locally on a user's own computer, and manipulated using dedicated desktop software. Moving the database online has a number of advantages: it makes the collection available from any web-enabled computer, and it allows easy sharing of personal collections (or subsets of them). Storing references online also opens up some new possibilities  direct linking into the literature, for example  and is the foundation that supports the other three pillars of Connotea (see below). Of course, online storage has its disadvantages too: the responsiveness and usability of the application, for example, and the fact that the availability of your data is in the hands of others. To alleviate these risks, we therefore intend to allow easy import and export between Connotea and other systems, including desktop reference management tools.

2. Simple, non-hierarchical organising

Instead of placing reference material in folders, or folders within folders, the organisational apparatus of Connotea creates a totally flat, but multi-faceted, space. The data can be viewed from the perspective of tags, or users, or links. The 'tags' of Connotea and other social bookmarking tools often lead commentators to decry the anarchy of unconstrained keywords, but this overlooks the fact that tags are intended first and foremost as a way for individual users to manage their own collections. In this way, they are similar in purpose to folder names in computer file systems. However, instead of creating sub-levels of organisation by nesting folders hierarchically, flat tagging achieves this by assigning multiple tags to each item, each tag being treated equally. This releases the user from some of the constraints of a traditional file system, and a web-based interface allows easy navigation of material organised in this way.

3. Opening the list to others

At the time of writing, all bookmarks posted to Connotea are visible to all registered users and visitors. This takes the concept of sharing to a new level, but also brings new opportunities. We recognise that some users will want to keep some of what they are reading personal, so private bookmarks will soon be added as a feature, though the default will be to make the bookmark public. The main benefits of openness come not just from the ease with which it allows explicit sharing with friends and colleagues, but from many users storing their bookmarks in the same place. This allows Connotea to automatically discover and present connections between users. For example, if someone else has bookmarked the same things as you, that person's library will be a good candidate for a place to find interesting new content. In addition, shared lists allow more sophisticated collaborative filtering algorithms to make recommendations of the form "people who bookmarked this also bookmarked...".

4. Auto-discovery of bibliographic information

In the spirit of making it quick and easy to use, Connotea attempts to find and import the bibliographic information for any article or book that is added. In many cases, this information is freely available on the web, so why should a user have to retype it? There is also, of course, the benefit of eliminating typing errors. Along with the use of bookmarklets (see below), automatically adding the bibliographic information reduces the workload of saving a reference to two clicks and a small amount of typing for tag names.

The Key Features of Connotea

What do these concepts mean for the day-to-day activities of a practicing researcher and their use of Connotea? Here we describe the main features of Connotea that implement the concepts discussed above.

1. Bookmarklets

Firstly, users need to be able to quickly and easily switch to their bookmark manager while in the middle of another task. For this purpose, Connotea offers bookmarklets  snippets of JavaScript code that can be saved in a browser to provide custom functionality [n1]. The most important of these is a bookmarklet that allows a user to add to Connotea the web page they are currently viewing. Figure 1 shows this in action using the Firefox web browser (this should work in other common browsers too).

Fig. 1. Connotea bookmarklets in a user's browser. Area A is the 'Add to Connotea' bookmarklet  in this screenshot the user has just clicked the button, the add form has appeared and has been automatically populated with the relevant bibliographic information. This information has been automatically read by Connotea from the website in question (in this case D-Lib's). Area B contains two other bookmarklets, for adding or viewing comments. Area C shows a standard bookmark for quickly accessing the user's collection of bookmarks on Connotea.

The user clicks the bookmarklet, the URL of the page is sent to Connotea, and Connotea presents a form for adding the page to the user's collection.

2. Recognising URLs from common archives and importing bibliographic data

Figure 1 also illustrates the second key feature of Connotea. When a URL is added to Connotea, it is first analysed in order to determine whether it belongs to the set of URLs that Connotea recognises. This can trigger a special behaviour for those web pages that represent academic articles or books  bibliographic data for the reference material is collected and added to Connotea. For example, for scholarly articles, Connotea stores the publication name, volume and issue numbers, publication date and the list of authors for the article. At the time of writing, Connotea supports four different article archives and websites [n2], and the Amazon websites for books.

This bibliographic data collection process is accomplished by means of a plug-in system. Each Connotea plug-in is specific to a particular archive or website, and must be able to follow the steps outlined in Fig. 2. We are happy to accept plug-ins written by third parties; volunteers should either contact the authors for details of the API (application programming interface) or refer to the Connotea source code and accompanying documentation [4].

As an example, we will now walk you through the bibliographic information import process for a particular PubMed article  in this case about the discovery of new hominid fossils attributed to the ancient species Ardipithecus ramidus. The abstract resides at

First of all, we can recognise this as an article from PubMed by the domain portion of the URL  'www.ncbi.nlm.nih.gov'. But, in addition, we need to check that it is an abstract page (not, for example, the search page or a results listing). So we look for the parameter 'list_uids', and confirm that it has only one value (in this case '15662421'). The second step is to extract the necessary information from this URL in order to unambiguously identify this article. In this case that is simple  the PubMed Identifier (PMID) we just found ('15662421') does this perfectly. The NCBI offers an easy-to-use web service [7] and Connotea queries it for the metadata associated with this PMID. This metadata can then be associated with the article in the Connotea database and displayed to the user. A short movie showing the bookmarking process for this article is available on the Connotea website [n3], and illustrates how quick the identification and fetching of metadata is  it all happens during the bookmarking process and goes almost unnoticed by the user.

Fig. 2. Flow diagram for adding bibliographic data to Connotea.

3. Tagging

Once the URL has been sent to Connotea and the bibliographic import process has been completed, the user can add personalised information. The most essential information is the list of tags to associate with the article. Tags are the means by which references are organised in Connotea. Suitable tags should therefore be meaningful in the context of that particular article and that user. For this reason, Connotea allows tags to be almost anything (including both single words and phrases). As discussed above, tags can be thought of as a list of categories for the article, or as folder names, albeit without the potential inconvenience of hierarchy and with the bonus of being able to store the article simultaneously in multiple folders. Figure 3 shows a Connotea window for adding an article. The article in question has been identified, and a few suitable tags have been entered. There is also an option to add a personal description of the resource being bookmarked.

Fig. 3. The 'Add to Connotea' form, with article identified, a personal description added and tags entered. Note that multi-word tags can be differentiated from multiple single-word tags by enclosing the entire tag name in quote marks or by joining the words with underscore characters.

4. Comments

The other noteworthy piece of personal data is the user's comment. Each user can comment any number of times on any bookmark in their library, and comments from different users are combined to display a chronological, and conversational, thread about a resource. Figure 4 shows a list of comments drawn from the contributions of individual users. Connotea also has a bookmarklet (see Fig. 1) for adding a new comment about the current web page. The idea is that when a user is viewing an article that they already have in their Connotea library, they can quickly and easily add a public note about it.

Fig. 4. User comments in Connotea.

5. RSS

The bookmarks stored in Connotea can be viewed and navigated by user, by tag, or by a combination of users and tags. The tag and user criteria can be combined using Boolean operators, which allows Connotea visitors to view, for example, all bookmarks created by users 'ben', 'tony', or 'timo' matching the tags 'RSS' and 'social_bookmarking' (more on this below). Every list of bookmarks in Connotea, however generated, offers a corresponding RSS feed, and any user or visitor can subscribe to any of these. This means that if, for example, you find the collection of a particular user interesting, or find that informative articles are often being assigned a particular tag, you can be alerted, via RSS [n4], to any new items that are added.

A Guided Tour

In this section we conduct a brief guided tour of Connotea's uses and functions. In the spirit of the web- and article-centric nature of Connotea, we start by bookmarking this very paper. To do this, either click on the 'Add to Connotea' bookmarklet (if you already have it installed) or go to

If you are already logged-in, you will be taken to a form for adding this paper. More likely, if you are not a registered user you will need to register for an account (which is required to post items, but not to browse them). This process takes only a minute or two, and when completed you will then be taken to the bookmarking form. The only required task to perform at this stage is to add some tags, the choice of which is, of course, entirely up to you, the user. We chose 'social bookmarking' and 'Connotea' as our tags.

Now that we have this paper in our library, we can use it as a starting point for exploring the rest of the site and the content. Let us find out who else has bookmarked this paper. Go to your library at

This page lists the distinct 'bookmarking events' for the paper, so although the paper title appears many times on the page, each item shows a different user who has bookmarked it, and the tags they used to categorise it.

These tags offer another perspective on the content of Connotea. Click on a tag name and a list of articles that have been categorised using that tag is displayed. Optionally, this list can be restricted to articles posted just by one particular user, and so provide a convenient way of navigating one's own collection (or, for that matter, the collection of another person). Tag-filtered views also provide a simple content discovery route: browsing through lists of bookmarks by hopping between co-occurring tags. For example, at the time of writing, a visitor to Connotea viewing articles tagged (by any user) with 'open access', can (in one click) see articles tagged as being about 'scholarly communication', and from there it is only one more click to see links filed under 'science publishing', as can be seen by visiting the following links:

where (if you have bookmarked it as described above) you should again see this paper in the list. If any user has commented on the paper, the total number of comments will be displayed and linked below the title and bibliographic information. Clicking on

will take you to the comments list. See Fig. 4 for a similar list. If you are logged in, a form for submitting a new comment appears at the bottom of the list. We encourage readers of this paper to contribute to the discussion there.

The first comment by user 'ben' includes a link to the reference list for this paper. Alternatively, you can navigate to the library of one of the authors of the paper. (Connotea allows users to claim authorship, and this information is displayed in the list of users who have bookmarked the paper. See Fig. 5 for an example of a Connotea user being identified as an author.)

Fig. 5. Screenshot of a Connotea bookmark that has been posted by an author on the bookmarked article. This relationship can be optionally flagged by the user during the bookmarking process and is presented to Connotea visitors, as seen in area A.

As described in the companion paper [6], judicious use of tagging can be used to enable online sharing of reference lists for papers (or book chapters or lectures or meetings) and also to facilitate subsequent sharing of new, related links among readers (or students or attendees). One of the tags in the tag list for 'ben' is being used specifically for tagging the reference list of this paper, so clicking on the tag name 'dlib-connotea-refs' takes the user to the following page in Connotea:

(See Fig. 7 for a screenshot of a similar page.) This is a 'closed' list intended to contain only links to the references used in this paper. The closed nature of the list is enforced by selecting only articles that have been bookmarked with the tag 'dlib-connotea-refs' by the user 'ben', who is the lead author of this paper. Even if other users apply the same tag to other content, this 'closed' list will not grow because user 'ben' has committed not to use this tag for any other purpose.

The materials in the reference list are also tagged with 'dlib-connotea' which, in contrast to the tag above used to label a 'closed' list, is intended to indicate an open list of related links associated with the subject of this paper. Contributions are invited from all readers who wish to participate. To add an item, simply use Connotea to bookmark a web page and assign the 'dlib-connotea' tag (as well as any other descriptive tags of choice). For example, the page

Note that in this case, unlike that above, we have not specified any user in the query, so you will see items that have been posted by all Connotea users. If you prefer to see new items posted only by the author of this paper then go instead to

Of course, another username  or even several usernames  can be used in place of 'ben' to track other people's contributions.

Second-Order Effects

While the primary goal for Connotea is for it to be useful as a personal bookmarking and reference management tool, some other features and uses emerge as a consequence of pooling this information, and also as a result of the way that individuals use Connotea. Emergent properties that have become apparent to us so far include tag convergence, recommendations, and directory creation.

Tag Convergence

The term 'folksonomy' [9] refers to the vocabulary, or list of terms, that emerges from the overlapping usage of tags between users. When bookmarking a page, users will often choose their tags deliberately to converge with those of other users in order to gain the discovery benefits described above. In addition, Connotea's 'rename tag' feature not only allows individual users to change their mind about the way they organise their personal libraries, it also facilitates the development of shared tags, by allowing users to change their own taxonomies to better fit in with those of other users [n5].

Although still at an early stage of development, we are beginning to see the emergence of a folksonomy in Connotea. Of the 3359 unique tag names in Connotea at the time of writing, 460 (14%) of them are shared by two or more users, 188 (6%) by three or more, and 109 (3%) by four or more. In fact, the distribution follows a classic power law, as shown in Fig. 6.

Fig. 6. The distribution of shared tags in Connotea. The vertical axis is the number of users who share a tag, the individual tags names are ranged out along the horizontal axis.

Implicit Recommendation

By analysing the data voluntarily supplied by users, Connotea is able to offer them a way of finding new and related content. In other words, Connotea is not only a place to store your bookmarks but is also a community-driven recommendation system.

The Connotea interface is hyperlink-oriented  bookmark titles are linked to the external web pages, user names are linked to their libraries, and tag names are linked to lists of bookmarks to which that tag has been applied. These links provide one-click access to related material. In addition, and wherever possible, Connotea presents a list of related tags. These are tags that have been co-assigned at some point with the tags that are associated with the current list of bookmarks. Note that it is the users themselves who drive this  their tagging behaviour is the foundation of the 'related tags' algorithm. Figure 7 shows a real example of this.

Fig. 7. Tags related to folksonomy (http://www.connotea.org/tag/folksonomy). It is interesting to note that not only has Connotea identified related terms, it has also found the plural form, and a typo variant.

In addition to using co-assignment of tags as a relation indicator, Connotea also employs common bookmarking events as a route into related content. Once again, this is entirely user-driven  if two users bookmark the same article, then their interests clearly overlap to some degree. The best place to see this behaviour in action is on an individual user's library page. In the right-hand column of that page there is a list of related users  clicking on those user names takes you to their libraries.

Semi-Automatic Directory Pages

Perhaps the most surprising emergent feature of Connotea is its rise to prominence in Google search results for search terms that coincide with certain Connotea tag names. Because all tag names are hyperlinked in Connotea, and because those links lead to pages listing material that has been bookmarked using that tag, Connotea becomes a user-generated directory of relevant and related resources, and one that Google's ranking algorithm often considers authoritative.

For example, at the time of writing a search on Google for 'Drosophila trachea'

gives the Connotea bookmark list for that tag as the first result. The term 'microflora' and various species names of ancient hominids that have been used as tag names are other good examples of this  while not claiming the number one spot, the relevant Connotea pages rank highly.

Future Plans and Prospects

The Connotea we have described here is the one that exists at the time of writing; it is not the same Connotea that will be available in a few days, weeks or months hence. Indeed, one of the major benefits web-based applications have over their desktop counterparts is that it is relatively quick and easy to roll out new functionality and fixes. Therefore updates can be very frequent  sometimes even occurring daily. In this spirit we have a number of new features planned for release in the near future.

Meeting the Needs

As alluded to in our overview of Connotea, we plan to add the ability to keep certain bookmarks private  viewable only by the user who posted them. While this removes certain benefits to others of bookmark sharing, we hope that it will also encourage some users to participate who might not otherwise use the system at all, and that those users will make at least some of their bookmarks public. Such users can, of course, still benefit from other users' public bookmarks.

We also plan to expand the number of sites whose URLs Connotea recognises. As Connotea understands more, it will retro-actively apply this knowledge to links that have been already posted, turning what were once simple bookmarks into full references. In addition, we are adding RIS and BibTeX import and export functionality to enable researchers to more closely co-ordinate their Connotea collection with their desktop applications and pre-existing reference stores.

As well as integrating with user's existing reference stores, it is important for Connotea to be able to integrate with the way they gain access to scholarly information. For this reason, we plan to support OpenURL linking from Connotea.

Another plan is to enable user groups. This would allow, for example, a research team to manage their reference lists collaboratively and selectively view recommendations generated only from within the team.

Above all, in deploying these and other future developments, we will be listening to feedback from Connotea users. Our priorities remain flexible and we are open to suggestions for other features that users would like to see.

The Open Source Opportunity

In addition to these development plans, we have also released Connotea's code under the open-source GNU General Public License [11]. The code, and related documentation, is available from

Some might ask why we would consider offering potential competitors a head start by releasing our code in this manner. There are a number of reasons.

One is the fact that we have already received several requests for the code from individuals and organisations whose feedback we would value highly. For example, one early adopter, Los Alamos National Laboratory, Research Library, is already experimenting with the Connotea code for possible deployment as an internal application. We invite others to explore similar possibilities and, where possible, supply us with feedback so that we can improve the publicly available implementation and code base.

In addition, opening up the source code will, we hope, help other communities, also non-scientific, to set up social bookmarking services of their own. And if any of these groups are able to offer us their own suggestions or improvements, then this will also benefit users of Connotea.

Finally, opening up the code allows webmasters to inspect it and to discover exactly how their own URLs are being treated. Those with sufficient motivation and programming expertise might even offer their own improvements, thus benefiting not only themselves but all Connotea users. Similarly, where a suitable plug-in for a website does not yet exist, the openness of the code should help webmasters  or other interested developers  to contribute one. Thus, by releasing the code we hope to be able to expand the number of sites from which Connotea can extract bibliographic information at a faster rate than we would be able to manage by keeping all development work in-house.

But above all, we believe that releasing the code is firmly in keeping with the spirit of Connotea as a piece of community software. It is easier to ask users to submit and share their data if we are willing to publish and share our code. Moreover, the main value of Connotea (and other social bookmarking services) lies not in the technology that runs them, but in the communities that use them. Anything that enhances the Connotea user community  whether that is providing them access to the source code or responding to their requests for new features  enhances Connotea.

Acknowledgements

The authors would like to thank Herbert Van de Sompel of the Los Alamos National Laboratory, Research Library and Tom Coates of BBC Radio and Music Interactive for reviewing this paper and for providing us with helpful insights and feedback.

Notes

n1. See the companion paper [6] for a fuller explanation of the use of bookmarklets in social bookmarking applications.

n2. The article archives and websites that Connotea
currently supports are: