You are here

Trove: More than a treasure? How finding information just became easier

Trove: More than a treasure? How finding information just became easier

Author:

Rose Holley

Publication date:

Wednesday, 1 September, 2010

Abstract:

In late 2009 the National Library of Australia released version 1 of Trove to the public. Trove is a free search engine. It searches across a large aggregation of Australian content. The treasure is over 90 million items from over 1000 libraries, museums and archives and other organisations which can be found at the click of a button. Finding information just got easier for many Australians. Exploring a wealth of resources and digital content like never before, including full text-books, journal and newspaper articles, images, music, sound, video, maps, websites, diaries, letters, archives, people and organisations has been an exciting adventure for users and the service has been heavily used. Finding and getting information instantly in context; interacting with content and social engagement are core features of the service. The paper highlights Trove features, usage, content building, and applications for contributors and users in the national context.

1. INTRODUCTION

I see tremendous opportunities for libraries this year because of advances in technology. The changes in technology mean that anyone can create, describe or recommend content, which means that many people and organisations are becoming librarians or libraries in their own way.

Librarians should not be threatened or dismayed by this but rather encouraged, it means that society is retaining its ongoing interest in the creation, organisation and dissemination of content, and we have an integral role in this. Libraries and librarians are relevant more than ever in this environment because we have vast amounts of data and information to share, a huge amount of information expertise, and an understanding of how technology can assist us in making information more accessible.

We need to have new ideas and re-examine our old ideas to see how technology can help us. What things have we always wanted to do that we couldn’t before, like providing a single all Australian information? Is this still pie in the sky or can we now achieve it? Libraries need to think big. As Charles Leadbeater would say “Libraries need to think they are leading a mass movement, not just serving a clientele”[1]. Librarians are often thought of as gatekeepers with the emphasis being on closed access, but technology enables gatekeepers to open doors as well as close them and this is the opportunity I see. However many institutions will need to change their strategic thinking from control/shut to free/open before they can make this transition, and take a large dose of courage as well. Harriet Rubin, American author says “Freedom is actually a bigger game than power. Power is about what you can control. Freedom is about what you can unleash” [2]. The National Library of Australia already took this step forward in 2008 with the advent of the Australian Newspapers beta service, which opened up the raw text of digitised Australian newspapers to the public for improvement, without moderation on a mass scale [3]. Based on a long history of collaboration across the Australian cultural heritage sector [4] with regard to digitisation, storage, and service delivery the National Library of Australia is well placed to take the lead with innovation in access to information.

Some people may say “But isn’t Google doing that, so why do we still need libraries?” There is no question in my mind that libraries are fundamentally different from Google and other similar services. Libraries are different to Google for these reasons: they commit to provide long term preservation, curation and access to their content; they have no commercial motives in the provision of information (deemed by various library acts); they aim for universal access to everyone in society; and they are ‘free for all’. To summarise libraries are always and forever. Who can say that of a search engine, or of any commercial organisation regardless of size?

The National Library of Australia reviewed its strategic directions and thinking in light of changes in technology and society. The three strategic objectives for 2009 – 2011 [5] are now:

“We will collect and make accessible the record of Australian life… We will explore new models for creating and sharing information and for collecting materials, including supporting the creation of knowledge by our users”

We will meet our users’ needs for rapid and easy access to our collections and other information resources.

We will collaborate with a variety of other institutions to improve the delivery of information resources to the Australian public.

In addition the strategic directions for the Resource Sharing and Innovation Division [6] acknowledge “The changing expectations of users that they will not be passive receivers of information, but rather contributors and participants in information services.” The outcome is ‘Trove’.

2. ABOUT TROVE

The library has redesigned its underlying infrastructure for all its discovery services, and developed a new discovery service which has many features for both data engagement and social engagement. The service is called ‘Trove’ http://trove.nla.gov.au and in a giant leap forward the service provides access to not just the Library’s collections but to any Australian content or collections. This is collaboration at its best. Warwick Cathro, advocate for collaboration and the strategic lead for the service states “Collaboration requires effort, and it also requires a change of mindset. In particular it requires a willingness to examine services from a perspective which does not place one’s own institution at the centre.” [7]

At present over 1000 libraries, archives, museums, galleries and other organisations have enabled their data to be shared in the service which currently provides metadata for over 90 million items. Trove is essentially a search engine. It harvests metadata thus aggregating it into one place for searching. Trove does not store the content, only the metadata, so users end up on the site which holds the source of the data. Results are relevance ranked, not returned in contributor order, or biased towards any one source. There is minimal work for organisations wishing to contribute their collections and all the extra user traffic goes direct to their own site. The difference between Trove and other search engines is that most of the content discovered via Trove would not be found in other search engines because it is buried in the ‘deep web’, for example in collection databases; and Trove has an Australian focus especially on unique Australian materials. To date most of the contributors are Australian cultural heritage organisations that hold unique Australian data (the gold in the Treasure Trove). However having said this the Trove team consider it vital that the resources within Trove are more widely discoverable via tools in common public use (e.g.Wikipedia, Google, Yahoo) so part of the plan is encouraging other search engines to harvest Trove; to encourage use of Trove work ID’s (persistent identifiers) in other sources so that linkages are created between sources; and to develop an API so that other sites can draw out information from Trove in different ways.

After the Trove infrastructure was developed the content of the nine separate collaborative services that the National Library of Australia has been managing for several years was integrated. These were the Australian National Bibliographic Database (Libraries Australia), PANDORA (archived websites), Australian Research Online, Picture Australia, Register of Australian Archives and Manuscripts, Australian Newspapers, People Australia, Music Australia, and Australia Dancing. Added to this were other international sources of digital content such as the Open Library, Hathi Trust and OAISTER. This immediately gave a content of about 90 million items (books, journals, pictures, music, newspapers, sound, video, diaries, archives, and websites) which could be easily found in a single search. The search results are complemented with relevant information from other websites. Heavily used websites which have an open API and can therefore be added to Trove as a target are: Google Books, Amazon, Wikipedia, Flickr and Google Videos (includes YouTube). These results in context are differentiated from the main results by appearing to the left of them.

The Library has utilised several methods of data collection. Although its preferred method is by using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) very little data is actually collected this way. This is because many libraries and archives are still unable to implement OAI. Because of this the Library now recognises that more flexibility is required with data collection. Other methods may be using an API, crawling sitemaps, using FTP or HTTP. The strength of cultural heritage institutions is that we have all used common data description schemas, and that we agree that data should be open and shared. Trove is the realisation and result of many, many years of working towards common standards (such as MARC, Dublin Core and EAC) in order to make information accessible and free.

3. TROVE DEVELOPMENT

Trove was planned and under development for a number of years [8] (previously referred to in staff papers as the Single Business Discovery Project). The Australian Newspapers beta service was actually a test of the new technical infrastructure which was based on MySQL, and a Lucene search index. The Australian Newspaper service was also the test-bed for social and data engagement and had the library’s first implementation of tagging in 2008. Because the Australian Newspapers service was very successful and scalable it was decided to continue with the same infrastructure for Trove. Social and data engagement features were also to be implemented on all content the library would deliver. Trove was released in November 2009 at an early stage of development, which was 15 months after Australian Newspapers beta had been released, with key staff working on both projects.

During the first half of 2010 there have been new releases of Trove at least once a month. The development of Trove has largely been driven by public feedback. This mirrored the newspaper development process. Feedback from the public is actively encouraged and feedback is considered critical for the development of the service, so we are sure we are meeting user needs. The sentiments of Charles Leadbeater in his essay the ‘Art of With’ [9] are being carried out. Leadbeater advocates that cultural heritage institutions need to learn the art of doing things ‘with’ people rather than ‘for’ or ‘to’ them. This includes the development cycle as well as the end results. Collaboration with the users is key. This was a lesson the library had already learnt with the Australian Newspapers. How can we know if we are meeting the changing expectations and needs of our users if we are not in close contact with them, moving along new paths together?\

4. TROVE FEATURES

The key features of Trove are listed below and I would encourage readers to explore Trove for themselves on a topic of their own interest:

Focus on Australian content

Single search (search across everything at once in a few seconds)

Find and Get (get options include digital, borrowing, buying, copying)

Results returned ‘in context’ (the context are the eight groupings or zones which were defined by users)

Restrict search to Australian items, online items, or items in your preferred locations

Items that are online are obvious from the brief results display.

Narrow down your search results by using ‘facets’ e.g. year, type, and language.

Check copyright status.

Cite the record using a persistent identifier and see different citation formats.

View main record only or all the versions that are within it (relevant for books and images).

Minimise some zones in result list.

Browse items by zone.

Registration and login are optional (required only to create a profile, search within ‘my libraries’, or create lists, private tags or private comments).

View related items.

‘Did you mean?’ help with search terms.

High quality relevance ranking (National Library resources are not favoured over those from other sources).

Features that have been implemented in 2009/2010 as a response to changing expectations of users can be grouped by data engagement and social engagement categories. The difference between the two is that data engagement is normally for the benefit of the individual only, whereas social engagement features encourage and nurture a virtual community to develop around the content or service.

User forum (add more information to your profile for others to see e.g. interests, location; contact other users, arrange to work collaboratively, help other users by answering their questions).

5. TROVE USAGE

Trove has already established a user base of 1 million people in the first 6 months, which is comparable to the initial audience of the Australian Newspapers beta. Expected usage of Trove is for at least 10 million people (half the Australian population). This is based on the expectation that usage should exceed annual foot traffic through library doors. National, State and Territory Libraries of Australia (NSLA) had 7.7 million people through their doors in the year 2008- 2009 [10]. Figures from the Australian Bureau of Statistics show that in 2006 half of the population belonged to a library and sought information on a regular basis, and that in December 2009 two thirds of Australian households had fast broadband access at home.

The National Library uses an extremely limited amount of its overall budget on marketing online services. Introducing a new service of this magnitude that is relevant and useful to a large section of the Australian population needs a targeted, well funded national marketing campaign. In the absence of such a campaign it is expected that general awareness and usage of Trove will not grow rapidly in the first year, instead growing slowly over 2-3 years. Word will spread by mouth, in the online environment and through library connections initially. Usage statistics gathered so far are as follows:

Feedback from thousands of users tells us that creating connections and linkages between data is important, that finding related information and showing information in context is really helpful. Users think sharing, repurposing, mashing and adding to information is equally as important as finding it in the first place. Consistent messages received from users are “Give us……”

As much access to as much information as possible in one place.

Tools to do stuff with the information we find.

Freedom and choices with finding, getting and interacting with the information.

Ways to work collaboratively together to achieve new things which have never before been possible so easily.

And in return our virtual community will give back their:

Enthusiasm to do stuff and help us

Expert subject knowledge

Time

Dedication

6. TROVE – FUTURE DEVELOPMENTS

In 2010 there are 3 main areas for development of Trove. These are to encourage new contributors to provide their data to the service; to continue development of the service with key new features being a Trove API, and improved access to journal articles; and to raise awareness and usage of Trove. On the last point libraries can help by referring to the marketing page in Trove http://trove.nla.gov.au/general/marketing. This shows you how to use a Trove logo on your website, to add a Trove search box to your site, to put Trove into your browser bar as a search box. Also some libraries are considering usage of Trove as their primary discovery service, or integrating Trove content into their own single discovery service when the Trove API becomes available later this year.

The success of Trove depends on having a large body of relevant content; the usability and functionality of the service; being able to successfully migrate users from the previous eight separate existing services to Trove; and raising awareness of the existence and usefulness of Trove in the community. Migration strategies are being developed for services such as Picture Australia, Australian Newspapers and Music Australia. Structured usability testing will be undertaken to address both the basic functions of Trove and difficult areas highlighted in user feedback later this year. A ‘low cost’ marketing campaign has been developed for 2010 which mainly comprises distribution of bookmarks, attendance at conferences, speaking engagements by the Trove Team and publication of articles/adverts in library journals. Any organisation that has content in Trove is encouraged to undertake its own marketing. Trove did receive national media coverage in April 2010, but sustained ongoing marketing is important.

7. CONCLUSION

Trove would not have been possible without the long history of collaboration across cultural heritage institutions in Australia, the usage of common standards across this sector and the shared understanding that data should be open and accessible wherever possible. The National Library has taken a leadership role in demonstrating that a shift in strategic thinking and action must take place to respond to the changing expectations of users. Control of information is no longer the ultimate goal, but rather in giving users freedom and choices to interact with the data and each other, to create their own context within the information, and add their knowledge and content to it. In the eyes of the users this is just as important as finding the information in the first place. Libraries are well placed to respond to these needs having the technology, tools and information expertise, but more than that, not being driven by commercial gain. Their honest long term goals are to simply make finding and getting information easier, now and forever and that is what Trove is all about.