Perspectives from HathiTrusthttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/when-simple-search-just-wont-do
enReflections on the First HathiTrust Member Meetinghttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/reflections-on-first-hathitrust-member-meeting
<span class='print-link'></span><p dir="ltr">By Mike Furlough, Executive Director, HathiTrust</p><p dir="ltr"><span style="font-size: 14.4444446563721px; line-height: 1.3em;">Since I started as Executive Director of HathiTrust in May of this year, I have done nothing but learn: learn about the organization, our operations, our finances, our people, and our partnership. I have traveled quite a bit, especially this fall, paying visits to HathiTrust members (thank you, libraries of </span><a href="http://search.library.cmu.edu/" style="font-size: 14.4444446563721px; line-height: 1.3em;">Carnegie Mellon</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, </span><a href="http://www.library.pitt.edu/" style="font-size: 14.4444446563721px; line-height: 1.3em;">Pittsburgh</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, </span><a href="http://library.harvard.edu/ssmps" style="font-size: 14.4444446563721px; line-height: 1.3em;">Harvard</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, and </span><a href="http://www.library.northwestern.edu/" style="font-size: 14.4444446563721px; line-height: 1.3em;">Northwestern</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">), several meetings of library organizations and consortia (thank you </span><a href="http://www.trln.org/" style="font-size: 14.4444446563721px; line-height: 1.3em;">TRLN</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, </span><a href="http://www.gwla.org/" style="font-size: 14.4444446563721px; line-height: 1.3em;">GWLA</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, </span><a href="http://www.coppul.ca/" style="font-size: 14.4444446563721px; line-height: 1.3em;">COPPUL</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">, and </span><a href="http://www.aserl.org/" style="font-size: 14.4444446563721px; line-height: 1.3em;">ASERL</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">), as well as a couple of special focus meetings on </span><a href="http://dhcs.northwestern.edu/" style="font-size: 14.4444446563721px; line-height: 1.3em;">digital humanities</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;"> and newspaper digitization (thanks to you all as well).</span></p><p dir="ltr">Although I usually give a talk about HathiTrust during these trips, I consider these listening, not proselytizing visits. I am there to learn about the turning points at which these organizations find themselves today and what strategic issues they are focusing on. And through their questions I find out what matters to them about HathiTrust—or what would matter to them more if we were to tackle this problem or that. It’s also useful to find out what people don’t know or don’t understand, because it sometimes means that users may not be getting the full benefit of HathiTrust.</p><p dir="ltr">The standout event of my first six months was the <a href="http://www.hathitrust.org/member_meeting_2014">2014 HathiTrust Members Meeting</a>, held in Washington, DC on October 11. This was the first meeting of our membership since <a href="http://www.hathitrust.org/constitutional_convention2011">the 2011 Constitutional Convention</a>, after which we developed our new governance structure, and adopted our current financial model. This was a unique chance to bring our partners together to update them on our current initiatives and engage them to begin planning for the future. Evident throughout the day were the membership’s strong sense of shared responsibility for the success of HathiTrust and the excitement for what we have done and will do together.</p><p dir="ltr">Here I’d like to offer some reflections on the day’s discussions, highlighting a few specific initiatives and some questions about where we are going as an organization. There’s no way in a short blog post to cover every single issue that came up that day, let alone in the last six months, so I hope you will forgive omissions for the sake of brevity and post questions or contact me directly. If you are interested, we have a <a href="member_meeting_2014_notes">more detailed report</a> on the Member Meeting, along with slides of most of the presentations.</p><p dir="ltr">First of all, the partnership is strong and continues to grow. After the 2011 Convention our membership increased from 64 to 101 member libraries and now includes four in Canada, one in Spain, and one in Australia. Over 60 individuals from 30 member different libraries currently serve on a HathiTrust working group, standing committee, or governance committee. Our infrastructure is strong and we have repeatedly confirmed that our work is grounded solidly in the law. A growing number of our members have identified staff to work with us to obtain access to the HathiTrust collection for users who have print disabilities. The collection has grown significantly. At the moment I am writing this we stand at 12.96 million volumes, 4.8 million of which are open for full-text access because they are no longer covered by copyright or because an author or publisher has made the material available using a Creative Commons license. We have struck some outstanding partnerships in the last several years, including <a href="http://dp.la/info/2013/06/18/hathitrust-to-partner-with-dpla/">one with the Digital Public Library of America,</a> which is now a notable source of viewers and readers of HathiTrust collections. In short, our preservation and access services provide a very solid basis for future work.</p><p dir="ltr">And that future work will continue to transform how libraries serve their users and manage collections. During its inaugural year the <a href="http://www.hathitrust.org/psc">Program Steering Committee</a> (PSC) launched working groups to plan programs passed as ballot initiatives at our 2011 Constitutional Convention. One of these, <a href="http://www.hathitrust.org/constitutional_convention2011_ballot_proposals#proposal1">a proposal to develop a shared and distributed print monographs archive,</a> will promote collective and coherent decisions about the retention and long-term management of print collections. &nbsp;By organizing a distributed print collection corresponding to the HathiTrust digital collection, we can strengthen our preservation commitments and better ensure future access to the cultural record. T<a href="http://www.hathitrust.org/print_monographs_archive_charge">he working group studying these issues will make their first recommendations in early 2015.</a>&nbsp;The chair of this group, Tom Teper of the University of Illinois Urbana Champaign reported on their work at the <a href="member_meeting_2014_notes#PrintMonographsArchive">Member Meeting</a>.</p><p dir="ltr">We have already taken action on another proposal from the Convention, <a href="http://www.hathitrust.org/constitutional_convention2011_ballot_proposals#proposal4">one to expand and enhance access to US federal government publications</a>. In 2013 we began the development of a &nbsp;<a href="http://www.hathitrust.org/usgovdocs_registry">Registry of US Federal Government Documents</a>. More recently, the <a href="http://www.hathitrust.org/usgovdocs_planning_charge">Government Documents Initiative Planning and Advisory Working Group</a>, led by Mark Sandler of the Committee on Institutional Cooperation, has made preliminary recommendations that are now under review by the Program Steering Committee. Currently HathiTrust holds over 575,000 known US federal publications in HathiTrust, but we believe there to be a substantial number of unidentified documents in the collection, and a much larger number of documents left undigitized. The recommendations of the Advisory Working Group include several that will strengthen the Registry project, and others that will help us to identify, source, and collect federal documents over the next several years. Mark Sandler also provided a report at the <a href="member_meeting_2014_notes#USGovernmentDocuments">Washington meeting</a>.</p><p dir="ltr">Stephen Downie of <a href="http://www.lis.illinois.edu/">the Graduate School of Library and Information Science at the University of Illinois, Urbana Champaign</a>, reported on the HathiTrust Research Center. Our goal in supporting the <a href="http://www.hathitrust.org/htrc">Research Center</a> is to simplify advanced computational access to our digital collection through services and infrastructure developed by experts. Downie, who along with Beth Plale from the School of Informatics and Computing at Indiana University co-directs the Research Center, outlined an ambitious agenda of service development, which will be furthered with substantial funding from HathiTrust and from both Illinois and Indiana. These plans include the development of training and services that can be integrated into services in a library’s research commons or in similarly-defined programs of advanced support for faculty and students. In addition to the development of these services, the Research Center has received funding for research from the Alfred P. Sloan Foundation, the Andrew W. Mellon Foundation, and the National Endowment for the Humanities. There is tremendous potential for the work undertaken by the Research Center to enable great improvements in the metadata and the content of the HathiTrust collections. &nbsp;They have recently announced the date for their <a href="http://www.hathitrust.org/htrc_uncamp2015">next “Uncamp” (March 30-31, 2015 in Ann Arbor, MI)</a> and released a request for proposals from which they will select projects for advanced research support from HTRC staff. &nbsp;(Researchers, including faculty and students, from HathiTrust member institutions have priority in this call). <a href="http://www.hathitrust.org/htrc/acs-rfp">The RFP includes detailed information</a>.</p><p dir="ltr">Because we have developed such a strong organization, collection, and infrastructure, we can readily address these challenges of print management, document identification, and services for computational research. Yet with all of this underway, we are still growing as an organization, and much of our discussion during the Member Meeting focused on how we can collectively chart HathiTrust's future paths. At the 2011 Convention, attendees referred a ballot measure to expand the mission of HathiTrust to the new Board of Governors for action. In Washington, board member Brian Schottlaender, <a href="http://www.hathitrust.org/documents/hathitrust-bylaws-revisions-proposed_oct2014.pdf">presented a draft of new language for the Bylaws (Section I - Purpose)</a>, developed in response. The language proposed makes clear that HathiTrust should not be as format-bound as we have been in the past. The original bylaws state that we are building a “digital archive of library materials converted from the print collections of the member institutions.” In proposed revisions, our purpose would be to collect "digital content of value to scholars and researchers, including a variety of formats and born-digital materials.” There was general support for these edits, though some members asked for further clarification on other points. We are finalizing the new text and it will be presented to the members for a vote in the near future.</p><p dir="ltr">Assuming these changes to the bylaws are passed, we will have to think about what it means for HathiTrust to collect the record of human knowledge in “a variety of formats.” Obviously we must pursue partnerships with publishers and other organizations to collect newly published materials in born-digital format. We made a start with that by collecting newly published university press books made openly accessible through the <a href="http://www.knowledgeunlatched.org/about/pilot-project/">Knowledge Unlatched pilot project</a>. But this is only a start, and we must be ready to collect material from other sources. In this regard, the discussions around future funding for scholarly monographs remain very important to monitor.</p><p dir="ltr">In our first several years we did undertake a pilot project that collected images in HathiTrust, and had plans for a pilot for audio materials that we did not complete. During an open discussion period in Washington I asked “How important are non-text formats for HathiTrust?” and the responses varied. &nbsp;No one disputed their importance, but some cautioned on the timing. For certain members they are critical. These members believe that we must better support visual and graphical materials, including those found in the books in our existing collection, as well as materials at-risk or otherwise less accessible in our archives and special collections. Some observed that as a body of materials, the government publications--on which we are so heavily focused--are and have always been multi-format. However, others cautioned that we still have much yet to do with the textual materials we’ve collected, and that there are other types of text collections we haven’t touched, such as newspapers. We should, in this view, not lose sight of what we do well and be mindful of the resources required to expand into new formats.</p><p dir="ltr">Making clear choices about what you not going to do can be powerful. Our success stems in part from our clarity and focus on text over the last six years. &nbsp;We’ve now developed great capacity and expertise in managing re-formatted print/text collections, and I am a strong believer in playing to your strengths. Expanding beyond text might be seen to diminish that focus. Although new format choices implicate development and would affect our resource allocations, “What formats?” is not the only question. What are we trying to achieve, and what types of future access do we need to envision?</p><p dir="ltr">Of course, books do not exist in a vacuum, and even a text-bound collection must in the future be able to connect its materials with users regardless of their working environment. Works of fiction, poetry, and other creative genres found in HathiTrust can be related to letters, draft manuscripts and other materials in archives around the world. The long-form arguments embodied in monographs are dependent upon those of other books, as well as articles in serials, primary source documents, collections of data, and so on. Virtually anything can be evidence in a scholarly argument, and for two decades now we have seen many experiments in multi-modal scholarship that attempts to make these relationships between argument and evidence manifest and seamlessly available. In what way can we prepare our infrastructure to connect to, if not collect, those related materials? As our friends at OCLC research have observed,&nbsp;<a href="http://oclc.org/research/publications/library/2014/oclcresearch-evolving-scholarly-record-2014-overview.html">“evolving scholarly record</a>” has become more heterogeneous and parts are at risk due to fragmentation in our mechanisms of management and preservation. &nbsp;We will have to address this format question squarely in the coming year, but we will do so the context of our overall mission, the services we can build together, and related strategic issues.&nbsp;Earlier this year the Program Steering Committee began outlining some issues related to collecting <a href="psc_policy_briefs#NonTextFormats">non-text formats</a>. This is only a start of the discussion, and this issue is also in the charge of the newly re-charged&nbsp;<a href="http://www.hathitrust.org/wg_collections_charge" style="font-size: 14.4444446563721px; line-height: 1.3em;">Collections Committee</a><span style="font-size: 14.4444446563721px; line-height: 1.3em;">.</span></p><p dir="ltr">It’s a very different world now than when we began and clearly it’s time for longer-range planning at HathiTrust. In 2008 “mass digitization” was still less than four years old, opinion about its value was mixed, and its future was uncertain. That is the moment we came from, but as we start 2015 we have many new venues in which to work on these problems as a collective. These include, among others, DPLA, t<a href="http://www.dpn.org/">he Digital Preservation Network (DPN)</a>, and <a href="http://aptrust.org/">Academic Preservation Trust (APTrust)</a>. These initiatives and others can also transform discovery, preservation, and access to the diverse scholarly products of our researchers and students, especially if we are coordinating our strategies. When we began HathiTrust some commentators doubted that we could be successful, but our success has partially enabled such a flourishing ecosystem of digital library infrastructure. Precisely because of our success, HathiTrust has a special obligation to work with others to help bring “coherence” (<a href="http://coherence.clir.org/">to borrow a term</a>) in this environment. &nbsp;</p><p dir="ltr">Whatever we do, these issues need to be addressed from multiple perspectives and with the needs of the membership at the center of the discussion. At the Member Meeting, and in private conversations I have had, some representatives have urged that we undertake our future development initiatives in the most inclusive and transparent manner possible without interfering with our agility. These are important and natural concerns. Our governance structures, including <a href="http://www.hathitrust.org/board_of_governors">the Board of Governors</a>, the Program Steering Committee, and various working groups, are providing mechanisms for this. For example, the PSC will work on creating processes for identification and evaluation of proposals for major new technical or service developments. In our startup years we have drawn heavily on the resources of the <a href="http://www.lib.umich.edu/">University of Michigan Library</a>. But in 2013 we launched <a href="http://www.hathitrust.org/zephir">Zephir</a>, developed and operated by the University of California’s <a href="http://www.cdlib.org/">California Digital Library</a>&nbsp;to manage metadata for the repository, and the HathiTrust Research Center is co-located at two of our member institutions. HathiTrust increasingly must stand up on its own and continue to draw upon the expertise of all of its members, enabling our libraries to build and offer their own services based on the HathiTrust collections and platform. Some attendees at the Member Meeting offered ideas aimed at making this possible, such as “microgrants” to fund investigations or research and development beyond the scope of the Research Center. Others expressed their hope to form a strong HathiTrust community, and want to see opportunities for member institutions to share programs or projects they’ve initiated based on the HathiTrust collection and services. These are great ideas to explore, and there are others found in the full report on the Member Meeting. I welcome others from you now and at any time. HathiTrust is partnership focused on sharing responsibility for preserving and curating our resources, and your involvement is necessary. &nbsp;</p>http://www.hathitrust.org/blogs/perspectives-from-hathitrust/reflections-on-first-hathitrust-member-meeting#commentsPerspectives from HathitrustTue, 18 Nov 2014 23:31:18 +0000hathitrust1826 at http://www.hathitrust.orgWhat's in your collection?http://www.hathitrust.org/blogs/perspectives-from-hathitrust/whats-your-collection
<span class='print-link'></span><h2>
Are you familiar with the Collections area of HathiTrust?</h2>
<p>
<a href="http://babel.hathitrust.org/cgi/mb">http://babel.hathitrust.org/cgi/mb</a></p>
<p>
There are currently 940 public collections created by users and library staff members at partner institutions, including several we have <a href="http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub#featured">featured</a>. HathiTrust collections provide a way to aggregate digital items related to a common theme, or associated with a given physical collection or location (for instance, the University of Michigan has created a collection of its <a href="http://babel.hathitrust.org/cgi/mb?a=listis;c=30688098">Hatcher Graduate Reference</a> reading room). Items can be added to a collection from HathiTrust full-text search results pages. Once they have been added to a collection, the full-text and bibliographic metadata of items can be searched independently of the larger repository. Items in collections can also be quickly copied to new or existing collections. These features make collections an easy way to refine a set of search results, share batches of items with others, or (in the case of the Michigan Graduate Reference collection), allow staff and users to search within specific collections to find the book with that one particular index or obscure term that they can pull from the shelf for more information.</p>
<!--break--><!--break--><p>
Staff at some of our partner institutions have been talking about how great it would be to have even more high-quality collections to help demonstrate the usefulness of this feature (and be used!). We&#39;d also like to explore how this kind of feature could better support library needs.</p>
<h2>
It&rsquo;s easy to create a collection</h2>
<p>
Once you are logged into HathiTrust (either as a member of a partner institution or using a University of Michigan Friend Account), you can easily create collections when viewing a volume or from full-text search results as shown below:</p>
<p>
<img alt="" src="/sites/www.hathitrust.org/files//CollectionsBlog_CollectionBuilder.png" style="width: 600px; height: 407px; " /></p>
<p>
&nbsp;</p>
<p>
<img alt="" src="/sites/www.hathitrust.org/files//FullTextSearchResults.png" style="width: 600px; height: 347px; " /></p>
<h2>
&nbsp;</h2>
<h2>
We can help</h2>
<p>
Because large collections can be somewhat cumbersome to create manually, we can work with you to help build them! To create a custom collection, we need to know the specific the items that are desired to be included. We can work from a list of item identifiers, or from one or more search queries. Item identifiers for large collections, or collections made from criteria that are not easy to search for can be obtained using one of the methods below:</p>
<ol>
<li>
HathiTrust&#39;s <a href="http://www.hathitrust.org/hathifiles">tab-delimited metadata files</a>. These files are an inventory of repository holdings, containing a variety of identifiers for volumes (ISBN, LCCN, OCLC, etc.), copyright information, and limited bibliographic metadata for each volume in HathiTrust. A description of the files is available at <a href="http://www.hathitrust.org/hathifiles_description">http://hathitrust.org/hathifiles_description</a>.</li>
<li>
HathiTrust <a href="https://dev.www.lib.umich.edu/hathitrust/data_api">Data API</a>. In addition to retrieving entire volume packages from HathiTrust (including images and OCR), the Data API can be used to find ids for volumes digitized from a particular source. The University of Michigan has built a demonstration application using the Data API that illustrates how this can be done. Please see <a href="http://www.lib.umich.edu/two-over-threehundred">http://www.lib.umich.edu/two-over-threehundred</a>.</li>
</ol>
<h2>
Custom collections</h2>
<p>
Here are some examples of collections that have been custom-built. If you haven&rsquo;t yet become familiar with our Collections feature, give it a try. If you are, and have some great ideas for collections but have had trouble making them, give us a holler and we can point you in the right direction or help you create it.</p>
<ul>
<li>
Collections built from one publication -- United States Congressional Serials Set: <a href="http://babel.hathitrust.org/cgi/mb?a=listis;c=1597493732">http://babel.hathitrust.org/cgi/mb?a=listis;c=1597493732</a>
<ul>
<li>
Given the title, we were able to locate all items associated with that title in HathiTrust and build a collection.</li>
</ul>
</li>
<li>
Collections built from a search term in the HathiTrust catalog -- Ancestry and Genealogy:<a href="http://babel.hathitrust.org/cgi/mb?a=listis;c=332123463"> http://babel.hathitrust.org/cgi/mb?a=listis;c=332123463</a>
<ul>
<li>
This collection was based on a catalog search for full view items where &quot;genealogy&quot; occurred anywhere in the bibliographic record. The owner has added items individually since the collection was created.</li>
</ul>
</li>
<li>
Collections built on holdings information in a partner catalog -- UM Hatcher Graduate Reference:<a href="http://babel.hathitrust.org/cgi/mb?a=listis;c=30688098"> http://babel.hathitrust.org/cgi/mb?a=listis;c=30688098</a>
<ul>
<li>
The list of ids for this collection was assembled using location data from the University of Michigan Library.</li>
</ul>
</li>
<li>
Collections built from analysis -- English Short Title Catalog:<a href="http://babel.hathitrust.org/cgi/mb?a=listis;c=247770968"> http://babel.hathitrust.org/cgi/mb?a=listis;c=247770968</a>
<ul>
<li>
HathiTrust is collaborating with the ESTC to determine volumes in a candidate set (English language volumes published before 1800), are both in the ESTC catalog and in HathiTrust. The matching volumes are included in the ESTC collection.</li>
</ul>
</li>
</ul>
<p>
Note that once collection(s) are built, we will transfer ownership to the requester so the collection(s) can be updated and maintained.</p>
<p>
Please contact <a href="mailto:feedback@issues.hathitrust.org">feedback@issues.hathitrust.org</a> with any questions or to get started!</p>
<!--break--><!--break-->http://www.hathitrust.org/blogs/perspectives-from-hathitrust/whats-your-collection#commentsPerspectives from HathitrustWed, 06 Jun 2012 19:23:31 +0000hathitrust965 at http://www.hathitrust.orgWhen a simple search just won't dohttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/when-simple-search-just-won039t-do
<span class='print-link'></span><p>
By Heather Chistenson, HathiTrust Communications Working Group</p>
<p>
With over 10 million volumes, and full text search free from commercial&nbsp;results ranking, HathiTrust is a go-to place for researchers who are serious about exploring the research library collection. Many technologists in&nbsp;our community who follow the <a href="http://www.hathitrust.org/blogs/large-scale-search" title="Large-scale Search Blog">HathiTrust Large-scale Search Blog</a> are aware of the work that has been going on since full text search went&nbsp;into beta and then live on the HathiTrust site in 2009. We&rsquo;re pleased&nbsp;to report that in 2012 HathiTrust continues to make progress with the implementation of more new advanced search features. With the leadership of Tom Burton-West at the University of Michigan, there have been two new feature releases in the past few months.</p>
<p><!--break--><!--break--></p><p> In February we released the first part of the advanced search interface for&nbsp;HathiTrust full-text search.</p>
<ul>
<li>
Advanced search allows users to combine a full-text search with searches within specific fields such as Title, Author, or Subject. For example if you want to find out where Charles Dickens used the phrase &quot;the best of times&quot; you can search for: [All of these words] [Dickens, Charles] in [Author] AND [This exact phrase][the best of times] in [Just Full Text]</li>
<li>
The advanced search interface also allows users to set limits by publication date, format, or language. Multiple languages or formats can be selected.</li>
</ul>
<p>
We have now released the second phase of advanced search.</p>
<ul>
<li>
Users can now combine up to four different fields connected by the &quot;AND&quot; or &quot;OR&quot; operators, and any limits set are retained if you click on the &quot;Revise this advanced search&quot; on the search results page.</li>
</ul>
<p>
For those moments when a simple search just won&rsquo;t do, we encourage you to give it a try!</p>
<p>
Go to <a href="http://babel.hathitrust.org/cgi/ls?a=page&amp;page=advanced" title="Advanced Full-text Search">Advanced Full-text Search</a>!</p>
http://www.hathitrust.org/blogs/perspectives-from-hathitrust/when-simple-search-just-won039t-do#commentsPerspectives from HathitrustThu, 26 Apr 2012 14:07:57 +0000hathitrust923 at http://www.hathitrust.orgTen Million and Countinghttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/ten-million-and-counting
<span class='print-link'></span><p>HathiTrust reached a major milestone on January 5, 2012, exceeding 10 million volumes in its digital collections. More than 2.7 million of these volumes are in the public domain, with viewing and downloading options available online. Statistics about the collections and a graph charting growth over time are available below (see also <a href="http://www.hathitrust.org/statistics_visualizations" title="Statistics and Visualizations">Statistics and Visualizations</a>). We have also prepared a timeline noting significant events on our way to 10 million volumes. As of January 5, 2012, 23 of HathiTrust&#39;s 67 partners are depositing content in the repository. Details on contributions by institution can be found in our <a href="http://www.hathitrust.org/updates">monthly updates</a>. See also our <a href="http://www.hathitrust.org/news_publications">News and Publications</a> page for press releases, papers, presentations, and more about HathiTrust over the last several years.</p>
<!--break--><!--break--><p>&nbsp;</p>
<p><strong>Copyright Distribution by Type</strong></p>
<p><img alt="" src="/sites/www.hathitrust.org/files//IC_PD_0.jpg" style="height:300px; width:684px" /></p>
<p><strong>Copyright Distribution by Date</strong></p>
<p><img alt="Copyright Distribution by Date" src="/sites/www.hathitrust.org/files//Dates_Copyright.jpg" style="height:363px; width:615px" /></p>
<p>&nbsp;</p>
<p><strong>Volume Distribution by Date</strong></p>
<p><img alt="Volume Distribution by Date" src="/sites/www.hathitrust.org/files//Dates.jpg" style="height:364px; width:648px" /></p>
<p>&nbsp;</p>
<p><strong>Volume Distribution by Language (1)</strong></p>
<p><img alt="Volume Distribution by Language (1)" src="/sites/www.hathitrust.org/files//Languages1_1.jpg" style="height:343px; width:620px" /></p>
<p>&nbsp;</p>
<p><strong>Volume Distribution by Language (2)</strong></p>
<p><img alt="Volume Distribution by Language (2)" src="/sites/www.hathitrust.org/files//Languages2_0.jpg" style="height:375px; width:640px" /></p>
<p><strong>Growth Over Time</strong></p>
<script type="text/javascript" src="//ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"> {"dataSourceUrl":"//docs.google.com/a/umich.edu/spreadsheet/tq?key=0Ag4T93aUS_BTdENtRXI2bWstd2kzR1NXX1BMWDdPMmc&transpose=0&headers=1&range=A1%3AC100&gid=1&pub=1","options":{"displayAnnotations":true,"vAxes":[{"viewWindowMode":"pretty","viewWindow":{}},{"viewWindowMode":"pretty","viewWindow":{}}],"height":371,"width":600,"wmode":"opaque","hasLabelsColumn":true,"hAxis":{"maxAlternations":1},"animation":{"duration":0}},"state":{},"view":"{\"columns\":[0,1,2]}","isDefaultVisualization":true,"chartType":"AnnotatedTimeLine","chartName":"Chart 1"} </script><p><strong>Timeline</strong></p>
<p>January 2008</p>
<ul>
<li>First formal multi-institutional commitments made to building HathiTrust</li>
</ul>
<p>March 2008</p>
<ul>
<li>First instance of HathiTrust repository infrastructure in place in Ann Arbor, Michigan</li>
<li>Storage purchased for second instance of repository in Indianapolis</li>
<li>University of Michigan coordinates site visit by a team from DRAMBORA
<ul>
<li>Results of the DRAMBORA review were published as</li>
</ul>
</li>
</ul>
<p>Seamus Ross, Andrew McHugh, Perla Innocenti, Raivo Ruusalepp: Investigation of the potential application of the DRAMBORA toolkit in the context of digital libraries to support the assessment of the repository aspects of digital libraries, Glasgow: DELOS NoE, August 2008, ISBN: 2-912335-41-8</p>
<p>April 2008</p>
<ul>
<li>Loading and testing of Google-digitized content from the University of Wisconsin begins</li>
<li>Preparations begin to establish second instance of repository in Indianapolis</li>
</ul>
<p>May 2008</p>
<ul>
<li>Testing of Lucene/Solr begins to provide full-text search across the repository</li>
<li>PageTurner application released with specialized accessible interface, allowing reading and full-text searching of individual volumes in the repository</li>
</ul>
<p>June 2008</p>
<ul>
<li>Lucene/Solr installed on development and production servers</li>
<li><a href="http://babel.hathitrust.org/cgi/mb?a=listcs;colltype=pub">Collection Builder</a> application released</li>
</ul>
<p>July 2008</p>
<ul>
<li>Ingest of content begins from the University of Wisconsin</li>
<li><a href="http://www.hathitrust.org/hathifiles" title="Hathifiles">Tab-delimited metadata files</a> are made available to facilitate local loading of HathiTrust bibliographic records
<ul>
<li>Read more about <a href="http://www.hathitrust.org/data" title="Data Availability and APIs">HathiTrust&nbsp;Data Availability and APIs</a></li>
</ul>
</li>
</ul>
<p>August 2008</p>
<ul>
<li>HathiTrust &ldquo;about&rdquo; website is released, including information about <a href="http://www.hathitrust.org/trac" title="HathiTrust TRAC Documentation">HathiTrust compliance</a> with criteria for Trustworthy Digital Repositories (TRAC) and other documentation</li>
<li>Benchmarking for full-text search indexing begins</li>
</ul>
<p>September 2008</p>
<ul>
<li>Plans initiated to enable distributed development of applications and services by partner institutions
<ul>
<li>3-prong strategy: to enable access to the PageTurner via an API, to create a development &lsquo;sandbox&rsquo; for shared development, and to develop a public discovery interface for the repository</li>
</ul>
</li>
</ul>
<p>October 2008</p>
<ul>
<li>HathiTrust formally launched, including the institutions of the CIC, the University of California system, and the University of Virginia
<ul>
<li><a href="http://www.hathitrust.org/press_10-13-2008">See the press release</a></li>
</ul>
</li>
<li>Storage installed at Indiana site and an additional 90 TB of storage is installed at both instances, bringing capacity at each site to 190TB</li>
<li>Public beta full-text search application released, allowing full-text search of 500,000 volumes</li>
</ul>
<p>November 2008</p>
<ul>
<li>Data synchronization between Michigan and Indiana sites is completed and routinized</li>
</ul>
<p>December 2008</p>
<ul>
<li>Agreement concluded with OCLC to create discovery interface for HathiTrust</li>
<li>Indiana site becomes fully operational mirror of storage at Michigan site</li>
</ul>
<p>January 2009</p>
<ul>
<li>Load testing for full-text search begins</li>
</ul>
<p>February 2009</p>
<ul>
<li>Work begins on temporary beta catalog interface for HathiTrust</li>
</ul>
<p>March 2009</p>
<ul>
<li>Redundancy (in Indiana) for Web hosting infrastructure and full-text search indexing is established</li>
<li>Sample datasets containing full-text OCR of repository volumes are made available to researchers</li>
<li>New storage purchased, bringing total capacity at each site to 320TB</li>
</ul>
<p>April 2009</p>
<ul>
<li>Temporary beta catalog released</li>
<li>Ingest of Google-digitized content from Indiana University and the University of California begins</li>
</ul>
<p>May 2009</p>
<ul>
<li>HathiTrust Research Center and Collaborative Development Environment working groups launched
<ul>
<li>The groups are charged to develop specifications for a HathiTrust Research Center and establish collaborative development environment for HathiTrust repository, respectively</li>
</ul>
</li>
<li>Alpha version of <a href="http://www.hathitrust.org/data_api" title="HathiTrust Data API">Data API</a> released</li>
<li>Michigan ingests legacy digital collections into the repository to pilot non-Google ingest</li>
</ul>
<p>June 2009</p>
<ul>
<li>California Digital Library begins work on improvements to PageTurner application</li>
<li>A record 379,000 volumes are ingested in June</li>
</ul>
<p>July 2009</p>
<ul>
<li>Working group formed to investigate need for 3rd instance of storage</li>
</ul>
<p>August 2009</p>
<ul>
<li><a href="http://www.hathitrust.org/technical_reports/HathiTrust_DisasterRecovery.pdf" title="HathiTrust Disaster Preparedness report">Report released</a> on HathiTrust Disaster preparedness</li>
<li>HathiTrust releases METS profile version 1.0
<ul>
<li>See <a href="http://www.hathitrust.org/digital_object_specifications" title="Digital Object Specifications">HathiTrust Digital Object Specifications</a></li>
</ul>
</li>
</ul>
<p>September 2009</p>
<ul>
<li>University of Michigan Press opens access to backfile publications in HathiTrust</li>
<li>UM and CDL staff begin collaboration for ingest of Internet Archive-digitized materials</li>
<li>Michigan staff contribute common-grams code to Solr code base</li>
</ul>
<p>October 2009</p>
<ul>
<li>Ingest of content begins from Penn State</li>
<li>Ingest of content begins from UC Santa Cruz and UC San Diego</li>
<li>A record 553,963 volumes are ingested in October</li>
</ul>
<p>November 2009</p>
<ul>
<li>Full-text search released (across 4.6 million volumes)
<ul>
<li>See the <a href="http://www.hathitrust.org/blogs/large-scale-search" title="Full-text Search Blog">Full-text Search Blog</a></li>
</ul>
</li>
</ul>
<p>December 2009</p>
<ul>
<li>Columbia University joins HathiTrust</li>
<li>Center for Research Libraries begins audit of HathiTrust for compliance with TRAC
<ul>
<li>See the <a href="http://www.hathitrust.org/trac">HathiTrust TRAC documentation</a> for information and results.</li>
</ul>
</li>
<li>HathiTrust <a href="http://www.hathitrust.org/bib_api" title="HathiTrust Bibliographic API">Bibliographic API</a> released</li>
<li>HathiTrust begins work to implement Shibboleth
<ul>
<li>View information about <a href="http://www.hathitrust.org/shibboleth" title="Shibboleth">Shibboleth in HathiTrust</a></li>
</ul>
</li>
<li>Redundancy of search index established at Indiana site</li>
</ul>
<p>January 2010</p>
<ul>
<li>Executive Committee approves new pricing model for HathiTrust
<ul>
<li>The new model allows participation of institution that do not have large amounts of digital content to contribute.&nbsp;<a href="http://www.hathitrust.org/help_new_cost_model">View the new pricing model FAQ</a>.</li>
</ul>
</li>
<li>Storage Working Group submits <a href="http://www.hathitrust.org/working_groups#storage">final report</a> to Executive Committee</li>
</ul>
<p>February 2010</p>
<ul>
<li>Sample of IA-digitized volumes from UC ingested for testing</li>
<li>Ingest of Google-digitized volumes begins from the University of Minnesota</li>
<li>Full-text search index exceeds Solr/Lucene&#39;s limit of 2.1 billion unique terms
<ul>
<li>Lucene core developer Michael McCandless creates patch allowing up to 274 billion. View the <a href="http://www.hathitrust.org/blogs/large-scale-search/too-many-words" title="Too Many Words">full-text search blog post</a>.</li>
</ul>
</li>
</ul>
<p>March 2010</p>
<ul>
<li>UM staff receive samples of locally-digitized materials from several CIC institutions (Iowa, Illinois, Northwestern) to begin working on scalable mechanisms and processes for ingesting locally-digitized content</li>
<li>OCLC begins loading records for HathiTrust volumes into WorldCat</li>
</ul>
<p>April 2010</p>
<ul>
<li>Ingest begins of an initial set of nearly 100,000 IA-digitized volumes from the University of California</li>
</ul>
<p>May 2010</p>
<ul>
<li>New York Public Library joins HathiTrust</li>
<li>HathiTrust passes 6 million total volumes and 1 million volumes in the public domain</li>
<li>Executive Committee launches <a href="http://www.hathitrust.org/wg_communications_charge">Communications Working Group</a></li>
</ul>
<p>June 2010</p>
<ul>
<li>HathiTrust enables authentication via <a href="http://www.hathitrust.org/shibboleth">Shibboleth</a>
<ul>
<li>In the short-run this allows partners to download full-PDFs of all public domain materials in the repository and use the Collections application through a local sign-on. Implementation of Shibboleth paves the way for future partner services, such as expanded access to in-copyright materials.</li>
</ul>
</li>
<li>Full-text search index is mirrored at Indiana site</li>
</ul>
<p>July 2010</p>
<ul>
<li>Yale University Library joins HathiTrust</li>
<li>Strategic Advisory Board launches&nbsp;<a href="http://www.hathitrust.org/wg_collections_charge" title="Collections Committee Charge">Collections Committee</a></li>
<li>Executive Committee launches&nbsp;<a href="http://www.hathitrust.org/wg_usability_charge" title="UX Advisory Group charge">User Experience Advisory Group</a></li>
<li>Collection-building functionality integrated into full-text search</li>
</ul>
<p>August 2010</p>
<ul>
<li>Princeton University Library joins HathiTrust</li>
<li>Ingest of Google- and Internet Archive-digitized volumes from Columbia University begins</li>
<li>HathiTrust adds 160 new TB of storage bringing total capacity at each site to 475 TB</li>
<li>October 31 deadline announced for joining HathiTrust to participate in &quot;constitutional convention&quot; of partners in 2011</li>
</ul>
<p>September 2010</p>
<ul>
<li>The Triangle Research Libraries Network and Dartmouth College join HathiTrust</li>
<li>Ingest of content begins from New York Public Library and the University of Illinois</li>
</ul>
<p>October 2010</p>
<ul>
<li>HathiTrust announces the 52 partners that will take part in 2011 Constitutional Convention
<ul>
<li>Newly announced partners include:
<ul>
<li>Baylor University</li>
<li>Emory University</li>
<li>Harvard University Library</li>
<li>Johns Hopkins University</li>
<li>Library of Congress</li>
<li>Massachusetts Institute of Technology</li>
<li>New York University</li>
<li>Stanford University Library</li>
<li>Texas A&amp;M University</li>
<li>Universidad Complutense de Madrid</li>
<li>University of Maryland</li>
<li>University of Pennsylvania</li>
<li>University of Pittsburgh</li>
<li>University of Utah</li>
<li>University of Washington</li>
<li>Utah State University</li>
</ul>
</li>
</ul>
</li>
<li>Image ingest pilot begins
<ul>
<li>The University of Minnesota, Minnesota Historical Society, and Minnesota Digital Library begin working with staff at Michigan to develop a prototype workflow for depositing images and associated metadata into the HathiTrust system for access, storage, and preservation purposes. <a href="http://www.hathitrust.org/mdl_images" title="MDL Images">Read more</a> about the project.</li>
</ul>
</li>
<li>California Digital Library begins work on a new bibliographic data management system for HathiTrust</li>
<li>Discovery Interface Working Group charges <a href="http://www.hathitrust.org/wg_fulltextsearch_charge" title="Full-text Search sub-group">Full-text Search sub-group</a></li>
<li>Ingest begins of content from Princeton University and the University of Chicago</li>
<li>Collaborative Development Environment is released, used actively for development, testing, and release of code for HathiTrust systems</li>
</ul>
<p>November 2010</p>
<ul>
<li>Ingest from Cornell University begins</li>
</ul>
<p>December 2010</p>
<ul>
<li><a href="http://www.hathitrust.org/ingest" title="HathiTrust Ingest">Policy and specifications framework</a> for ingest of locally-digitized materials is finalized</li>
<li>HathiTrust begins working with CIC institutions on ingest of locally-digitized content</li>
</ul>
<p>January 2011</p>
<ul>
<li>OCLC releases WorldCat Local prototype catalog for HathiTrust</li>
<li>HathiTrust ingests nearly 60,000 images and associated metadata from the University of Minnesota and partners</li>
<li>HathiTrust adds support for rights holders to open access to works with Creative Commons licenses
<ul>
<li>The <a href="http://www.brooklynmuseum.org/community/blogosphere/2011/03/11/brooklyn-museum-books-online/">Brooklyn Museum</a>, Society of American Archivists and many others are early adopters. View the rights holder <a href="http://www.hathitrust.org/permissions_agreement" title="Permissions Agreement">Permissions Agreement</a>.</li>
</ul>
</li>
</ul>
<p>February 2011</p>
<ul>
<li>HathiTrust makes datasets of public domain materials available on a large scale
<ul>
<li>See <a href="http://www.hathitrust.org/datasets">HathiTrust Datasets</a> for more information</li>
</ul>
</li>
</ul>
<p>March 2011</p>
<ul>
<li>HathiTrust certified by the Center for Research Libraries as a Trustworthy Digital Repository
<ul>
<li><a href="http://www.hathitrust.org/trac">See HathiTrust&rsquo;s TRAC documentation</a></li>
</ul>
</li>
<li>Ingest from the Library of Congress begins</li>
<li>HathiTrust signs agreement with ProQuest to make the HathiTrust full-text index available via Serials Solutions&#39; Summon service</li>
<li>Executive Committee launches <a href="http://www.hathitrust.org/wg_user-support_charge" title="User Support Working Group charge">User Support Working Group</a></li>
</ul>
<p>April 2011</p>
<ul>
<li>HathiTrust releases new viewing functionality in PageTurner application
<ul>
<li>See the <a href="http://www.hathitrust.org/updates_april2011">Update on April 2011 Activities</a> for details</li>
</ul>
</li>
<li>Ingest from Harvard University begins</li>
<li>HathiTrust concludes first storage replacement cycle, replacing storage purchased in 2007</li>
<li>Planning begins for the HathiTrust Constitutional Convention</li>
</ul>
<p>May 2011</p>
<ul>
<li>HathiTrust begins investigation to identify orphan works in HathiTrust</li>
<li>Ingest of content from University of Virginia begins</li>
</ul>
<p>June 2011</p>
<ul>
<li>Boston University and Lafayette College join HathiTrust</li>
<li>UM announces plans to provide access to orphan works to partner institutions</li>
<li>The <a href="http://www.hathitrust.org/htrc" title="HathiTrust Research Center">HathiTrust Research Center</a> is launched, led by Indiana University and the University of Illinois</li>
<li>HathiTrust begins ingest of materials digitized by Yale University Library</li>
<li>&quot;Perspectives on HathiTrust&quot; blog is launched, with inaugural post on <a href="http://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-and-discovery">HathiTrust and Discovery</a> by John Wilkin</li>
</ul>
<p>July 2011</p>
<ul>
<li>The University of Notre Dame and University of Florida join HathiTrust</li>
<li>3-year review of HathiTrust is posted on the HathiTrust website and distributed to partners
<ul>
<li>The 3-year review was prepared by Ithaka S+R with oversight by the Strategic Advisory Board in advance of the Constitutional Convention to lay the groundwork for discussions about HathiTrust&rsquo;s future. View the <a href="http://www.hathitrust.org/documents/hathitrust-3year-review-2011.pdf">3-year review</a> and the <a href="http://www.hathitrust.org/constitutional_convention2011">Constitutional Convention information page</a>.</li>
</ul>
</li>
<li>HathiTrust posts the first set of orphan candidate works</li>
<li>HathiTrust releases improvements to the Collections application interface and full-text search
<ul>
<li>Improvements to full-text search include the 2 highest priorities from a <a href="http://www.hathitrust.org/full-text-search-features-and-analysis">full-text search features analysis</a> prepared by the Full-text Search Working Group: the incorporation of bibliographic metadata into the full-text index to allow faceting of results by bibliographic data and improved search results ranking.</li>
</ul>
</li>
<li>First version of partner print holdings database released
<ul>
<li>The holdings database is to act as the basis for the <a href="http://www.hathitrust.org/help_new_cost_model">new pricing model</a>, and expanded access to in-copyright materials for members of partner institutions. See the <a href="http://www.hathitrust.org/updates_july2011#holdings">Update on July 2011 Activities</a> for more information.</li>
</ul>
</li>
<li>The HathiTrust Research Center receives a $600,000 grant from the Sloan Foundation to investigate &ldquo;non-consumptive&rdquo; research
<ul>
<li>The term &ldquo;non-consumptive&rdquo; was first used in the proposed Google Settlement to refer to computational research performed on in-copyright works In relation to in-copyright works, &quot;non-consumptive&quot; research in such a way that significant reading or &quot;consumption&quot; of the works does not occur.</li>
</ul>
</li>
</ul>
<p>August 2011</p>
<ul>
<li>University of Connecticut joins HathiTrust</li>
<li>Cornell, Duke, Johns Hopkins, Emory University, and the University of California system announce participation in the Orphan Works Project
<ul>
<li>View information about the <a href="http://www.hathitrust.org/authors_guild_lawsuit_information#Details">terms of access</a> proposed to orphan works. See also the <a href="http://www.lib.umich.edu/orphan-works">Orphans Works Project page</a> on the University of Michigan Library website. Note: No orphan works are currently available in HathiTrust&nbsp;(as of January 6, 2012).</li>
</ul>
</li>
<li>Proposal to establish print monographs archive distributed to partners
<ul>
<li>The proposal is submitted by the Collections Committee for the Constitutional Convention. View <a href="http://www.hathitrust.org/constitutional_convention2011_ballot_proposals#proposal1">the final accepted proposal</a> and the <a href="http://www.hathitrust.org/constitutional_convention2011">Constitutional Convention information page</a>.</li>
</ul>
</li>
<li>HathiTrust releases <a href="http://m.hathitrust.org" title="HathiTrust Mobile">mobile interfaces</a> for catalog and PageTurner applications</li>
<li>HathiTrust begins ingest of rare books and incunabula digitized by Universidad Complutense de Madrid</li>
<li>HathiTrust begins working with the University of Pittsburgh and University of Utah on ingest of locally-digitized materials</li>
<li>HathiTrust begins ingest of Utah State University Press backfile publications, to be made available in HathiTrust on an open access basis</li>
<li>HathiTrust begins ingest of Google-digitized volumes from Northwestern University and Purdue University, and Internet Archive-digitized volumes from North Carolina State University</li>
<li>HathiTrust concludes agreements with OCLC and EBSCO to make the HathiTrust full-text index available via their discovery services</li>
</ul>
<p>September 2011</p>
<ul>
<li>The University of Connecticut and University of Missouri join HathiTrust</li>
<li>HathiTrust, Google, and Duke University Press sign agreement to open access to DUP backfile volumes in HathiTrust under Creative Commons licenses</li>
<li>The Authors Guild and others file a lawsuit against HathiTrust alleging copyright infringement
<ul>
<li><a href="http://www.hathitrust.org/authors_guild_lawsuit_information">View information about the lawsuit</a></li>
</ul>
</li>
<li>HathiTrust begins working with the University of Florida and the University of North Carolina-Chapel Hill on ingest of locally-digitized materials</li>
<li>Partners submit final ballot proposals for the Constitutional Convention. 7 are submitted in all.
<ul>
<li>View <a href="http://www.hathitrust.org/constitutional_convention2011_ballot_proposals">the proposals</a> and the <a href="http://www.hathitrust.org/constitutional_convention2011">Constitutional Convention information page</a>.</li>
</ul>
</li>
</ul>
<p>October 2011</p>
<ul>
<li>The University of Miami and University of Arizona join HathiTrust</li>
<li>The Constitutional Convention takes place; 5 out of 7 ballot initiatives are passed
<ul>
<li>View the <a href="http://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-constitutional-convention-on-record">blog post about the Convention</a>, and the <a href="http://www.hathitrust.org/constitutional_convention2011">Constitutional Convention information page</a> which includes <a href="http://www.hathitrust.org/documents/HathiTrust-ConCon-Notes.pdf">notes from the Convention</a>.</li>
</ul>
</li>
<li>Ingest of Internet Archive-digitized content begins from Duke University and University of North Carolina-Chapel Hill</li>
</ul>
<p>November 2011</p>
<ul>
<li>Boston College joins HathiTrust</li>
<li>The University of California begins offering reprints of UC-digitized public domain materials via HathiTrust</li>
<li>The User Experience Advisory Group releases <a href="http://www.hathitrust.org/personas" title="HathiTrust Personas">HathiTrust User Personas</a></li>
</ul>
<p>January 2011</p>
<ul>
<li>HathiTrust reaches 10 million volumes</li>
</ul>
<p>February 2014</p>
<ul>
<li>HathiTrust reaches 11 million volumes</li>
</ul>
<p>&nbsp;</p>
http://www.hathitrust.org/blogs/perspectives-from-hathitrust/ten-million-and-counting#commentsPerspectives from HathitrustFri, 06 Jan 2012 08:23:11 +0000hathitrust837 at http://www.hathitrust.orgPersonas: Understanding HathiTrust Usershttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/personas-understanding-hathitrust-users
<span class='print-link'></span><p>
By Jenny Emmanuel, HathiTrust User Experience Advisory Group</p>
<p>
The HathiTrust User Experience Advisory Group recently released a set of &ldquo;personas&rdquo; depicting typical users of HathiTrust Digital Library.&nbsp; Personas are aggregate statements that display information about typical users and their needs.&nbsp; They are a commonly used usability method that collects data from multiple sources, including website analytics, search logs, first person stories, researcher observations, and other methods which are then aggregated into a narrative to depict stories depicting who HathiTrust users are and how they used the information within HathiTrust.&nbsp;</p>
<!--break--><!--break--><p>
Personas are typically used throughout the development process so that staff working on the HathiTrust interfaces, communicating with users, and librarians can have a shared idea who it is that uses the Hathi.&nbsp; With personas, they can easily keep the end user in mind while they are improving HathiTrust, developing support materials, developing user education programs, or many other uses.</p>
<p>
The HathiTrust User Experience Advisory Group worked on the personas for several months, with the additional help of the University of Michigan&rsquo;s User Experience Department.&nbsp; The group gathered information from analytics, anecdotes from HathiTrust partners, various online publications about HathiTrust (blogs, articles, comments, etc.), reports from similar projects, and user feedback to identify major groups who use HathiTrust for research.&nbsp; These data sources led to the creation of seven distinct groups of both academic and non-academic researchers, each of which became the basis of one of the personas. The collected data was then collated between each of these groups and then written as a narrative with a generic use case, a given identity, and a stock image to give each persona a personal touch.&nbsp; Even though the personas appear to be of actual people and actual use, it should be noted that each persona is fictional, but supported by collected evidence.</p>
<p>
The HathiTrust personas are used to guide further development of HathiTrust. They are also being utilized by the HathiTrust Communications Working group in their publicity and educational materials related to HathiTrust.</p>
<p>
To view the personas, see: <a href="http://www.hathitrust.org/personas" title="http://www.hathitrust.org/personas">http://www.hathitrust.org/personas</a>.</p>
<!--break--><!--break-->http://www.hathitrust.org/blogs/perspectives-from-hathitrust/personas-understanding-hathitrust-users#commentsPerspectives from HathitrustFri, 16 Dec 2011 14:46:39 +0000hathitrust798 at http://www.hathitrust.orgIs that the library in your pocket?http://www.hathitrust.org/blogs/perspectives-from-hathitrust/library-your-pocket
<span class='print-link'></span><p>By Suzanne Chapman, Chair, HathiTrust User Experience Advisory Group</p>
<p>Looking for books to read on your shiny new tablet or other mobile device? This fall we officially released a <a href="http://m.hathitrust.org/" title="HathiTrust Mobile">mobile version</a> of the HathiTrust Digital Library. The mobile site offers mobile-friendly access to key functionality including searching the HathiTrust catalog and reading HathiTrust "Full view" texts. Users from HathiTrust partner institutions can also download these "Full view" texts in PDF or ePub format to allow reading offline. Since the mobile interface is web-based, it works on all platforms and may be viewed either from mobile devices or from desktops and laptops. The interface has special functionality for tablets with two ways to read texts: either in the vertical scrolling format, or in a horizontal flip format. </p>
<p>Please give it a try and <a href="mailto:feedback@issues.hathitrust.org" title="Mail to feedback@issues.hathitrust.org">let us know what you think</a>! </p>
<p><a href="http://m.hathitrust.org/" title="http://m.hathitrust.org/">http://m.hathitrust.org/</a></p>
<p>Many thanks to the University of Michigan Library User Experience Department for designing and developing this exciting new interface.</p>
<!--break--><!--break-->http://www.hathitrust.org/blogs/perspectives-from-hathitrust/library-your-pocket#commentsPerspectives from HathitrustSat, 03 Dec 2011 20:40:02 +0000hathitrust784 at http://www.hathitrust.orgHathiTrust Constitutional Convention on Recordhttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-constitutional-convention-on-record
<span class='print-link'></span><p>
On October 8-9, 2011 delegates from across the U.S. and around the world gathered in Washington, DC for a landmark event, the <a href="http://www.hathitrust.org/constitutional_convention2011" title="HathiTrust Constitutional Convention">HathiTrust Constitutional Convention</a>. Our goals were to review the work and accomplishments of the now 3-year-old HathiTrust, and chart its future governance and priorities. Before the group were <a href="http://www.hathitrust.org/constitutional_convention2011_ballot_proposals" title="ballot proposals">seven different ballot proposals</a> that had been submitted by HathiTrust partners ahead of the meeting. On a beautiful autumn weekend, the delegates headed indoors, gathered around tables, and deeply engaged in the proceedings and discussion.</p>
<!--break--><!--break--><p>
As a result of these proceedings, HathiTrust:</p>
<ul>
<li>
Will establish a governance structure consisting of a Board, a Board Executive Committee, and Board-appointed committees, and will articulate bylaws</li>
<li>
Will formalize a transparent process for inviting, evaluating, ranking, launching and assessing development initiatives</li>
<li>
Will establish a shared print monograph archiving program among the member libraries</li>
<li>
Will expand and enhance access to U.S. federal publications including those issued by GPO and other federal agencies</li>
<li>
Will develop and vet a fee-for-service model to allow contribution of content from non-partner entities</li>
</ul>
<p>
The Convention was also an opportunity to celebrate the achievements of HathiTrust in less than three short years: over 60 partners, infrastructure that preserves and makes discoverable close to 10 million volumes, and the HathiTrust Research Center that will enable new forms of research.</p>
<p>
For a full account of the proceedings, please consult the <a href="http://www.hathitrust.org/documents/HathiTrust-ConCon-Notes.pdf" title="HathiTrust Constitutional Convention Notes">official minutes of the Constitutional Convention</a>.</p>
<!--break--><!--break-->http://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-constitutional-convention-on-record#commentsPerspectives from HathitrustThu, 10 Nov 2011 16:02:38 +0000hathitrust750 at http://www.hathitrust.orgHathiTrust's Past, Present, and Futurehttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust039s-past-present-and-future
<span class='print-link'></span><div>
Opening remarks given at the <a href="http://www.hathitrust.org/constitutional_convention2011" title="HathiTrust Constitutional Convention">HathiTrust Constitutional Convention</a>, October 8, 2011 (<a href="http://www.hathitrust.org/documents/HathiTrust-ConCon-Wilkin-201110.pptx" title="HathiTrust's Past, Present, and Future slides">view presentation slides</a>)</div>
<div>
By John Wilkin, HathiTrust Executive Director</div>
<p></p>
<p>
Think back to 2004 and the conversations going on in our community around digitization and the challenge of making big things happen at the intersection of our institutions. Digitization on a grand scale was 10,000 volumes, and we rejected any notion of digitizing a large corpus of materials like US federal government documents for countless reasons. In the years since our 2005 announcement that we were undertaking digitization on a large-scale, our community, in collaboration with Google and the Internet Archive, has digitized over half of the collective holdings of ARL libraries. Three years later, we launched HathiTrust, an organization that facilitates collective action on a grand scale. Seldom has so much in our world changed in such a short time. Together, we have utterly transformed parts of the library landscape.</p>
<p>
My plan today is to talk about HathiTrust&rsquo;s past, present and future. Don&rsquo;t worry&mdash;I won&rsquo;t do a history of HathiTrust. My discussion of the &ldquo;past&rdquo; will be primarily about the organization&rsquo;s early accomplishments, and begins with a review of our Short- and Long-Term Functional Objects. I&rsquo;ll then talk briefly about a few things in the HathiTrust pipeline, and finally conclude with an overview of some of the larger changes that have taken place since 2008. A point I&rsquo;d like to emphasize now and throughout is that this is a &ldquo;libraries writ large&rdquo; success story. What has happened is something that <em>we</em> <em>accomplished collectively</em>. This is not a story of an external organization&mdash;Google, a government agency, or some external champion&mdash;doing something for us. This is our story, and one that we need to understand and celebrate.</p>
<h2>
<strong>Short- and Long-Term Functional Objectives</strong></h2>
<p>
In those early, heady days of HathiTrust, the first partners established a list of Short- and Long-Term Functional Objective. These objectives were not meant to encompass all of HathiTrust development, but were a vehicle to articulate goals for a quickly emerging organization, a way to give some initial direction until other mechanisms could create a more nuanced roadmap. We needed to define goals in order to test <em>responsiveness</em> for this new organization.</p>
<h3>
<strong>Short-term</strong></h3>
<ol>
<li>
Page turner mechanism</li>
<li>
Branding (overall initiative; individual libraries)</li>
<li>
Format validation, migration and error-checking</li>
<li>
Development of APIs that will allow partner libraries to access information and integrate it into local systems individually</li>
<li>
Access mechanisms for persons with disabilities</li>
<li>
Public &lsquo;Discovery&rsquo; Interface for HathiTrust</li>
<li>
Ability to publish virtual collections</li>
<li>
Mechanism for direct ingest of non-Google content</li>
</ol>
<h3>
<strong>Long-term</strong></h3>
<ol>
<li>
Compliance with required elements in the Trustworthy Repositories Audit and Certification (TRAC) criteria and checklist</li>
<li>
Robust discovery mechanisms like full-text cross-repository searching</li>
<li>
Development of an open service definition to make it possible for partner libraries to develop other secure access mechanisms and discovery tools</li>
<li>
Support for formats beyond books and journals</li>
<li>
Development of data mining tools for HathiTrust, and use by HathiTrust of analysis tools from other sources</li>
</ol>
<p>
For every one of these functional objectives, HathiTrust has delivered something meaningful to the partnership. It&rsquo;s worth noting that some of these objectives were monumentally difficult and there was absolutely no certainty that we would succeed in all of them. In the end, what we accomplished was the creation of a rich, open system with a nuanced understanding of rights and the ability to deliver various forms of content to different audiences in different ways. All of the content in HathiTrust is discoverable with a superb balance of precision and recall, and the services we offer around the preservation of the content are without peer.</p>
<p>
Although I won&rsquo;t cover the Functional Objectives in detail, I would like to highlight three of the more ambitious accomplishments: our TRAC certification, the full-text cross-repository searching, and the creation of a research center.</p>
<p>
HathiTrust is only the second repository (after Portico) to receive certification by CRL. HathiTrust&rsquo;s process for certification involved countless hours of staff work developing processes and products, and creating and providing documentation. And that is as it should be. Certification is all about accountability and openness, and we can take pride in obtaining it. We are a distinctive type of organization, not analogous to OCLC or Portico, and our organizational distinctiveness tends to confound those who want to see a central office and central staff. It was important to document for CRL the large commitment of staffing across the partnership to help them understand that HathiTrust is not <em>apart</em> <strong><em>from</em></strong> us, but rather a <em>part <strong>of</strong> us</em>&mdash;that HathiTrust is not separable from our institutions. We excelled in the technical components of the review, but CRL has lingering questions about the organization. I believe we too have lingering questions about the organization. We want this effort to be part of us and not separate, and there are few models of how to make that work. This tension between something central and something that we are all a part of will, I believe, be a leitmotif in our meeting over the next three days. I think we&rsquo;ve made good progress and that we&rsquo;ve created a productive and healthy tension. {At this juncture, I&rsquo;d like to pause to introduce Heather Christenson, the chair of the HathiTrust Communications Working Group. We owe this group a great debt of gratitude for showcasing our successes, but this group also highlights the value of the inter-institutional work <em>and the tension it creates</em>.}</p>
<p>
A second grand accomplishment I&rsquo;ll highlight is the creation of a viable full-text search mechanism that works with all of the content in the repository. I hope no one here is so jaded as to think that full-text searching across millions of volumes is a slam-dunk. Many were skeptical, and I can&rsquo;t tell you how many calls I fielded from vendors telling me that what we were attempting was impossible&mdash;or at least impossible without their help. The effort required a large amount of research and testing, and what we learned required deep collaboration with the broader community of developers working on the Apache Solr search engine project. The resulting service is sensitive to the amount of content&mdash;unparalleled in size&mdash;to the hundreds of languages and character sets, and to requirements like phrase searching that reflect the distinctive ways users approach a vast and diverse <em>library</em> collection. Our users can now search over 3 billion words and get results in a split second. Collective work in the partnership has produced faceted results in our full text, and ranking that takes bibliographic information in the full text into account. The functionality that we have today is tremendous, and it provides a foundation for a next generation of search that gives our users access to bibliographic information where needed, and full text where desired.</p>
<p>
The creation of a research center is a very different kind of example and helps underline the value of collective action. Indiana University and the University of Illinois assembled the cyberinfrastructure resources to create a research center supporting uses of the HathiTrust collection. The consolidation of collections and institutional focus made HathiTrust a valuable partner for researchers at those two institutions. It was so valuable that they <em>redirected</em> institutional resources to create the infrastructure and leadership needed for this initiative&mdash;they created the research center at little or no cost to us. How much more compelling it is that the research center comes from faculty leadership (from those who would <em>do</em> the research), drawn to use of this immense library, rather than from us in support of those faculty. Indeed, because of their commitment and credibility, the research center has attracted significant funding from Sloan to deal with problems like security in use of the in-copyright materials, and I think we can expect them to be a magnet for other funding in the future. The research center will soon offer a platform for uses we could imagine but could not otherwise support. We&rsquo;re accomplishing the functional objective of support for research uses of the data in a number of ways, including by distributing public domain data, but the creation of a research center was a significant win for all of us and comes as a result of our working together to create a compelling library resource.</p>
<h2>
<strong>Other accomplishments</strong></h2>
<h3>
<strong>Holdings and the New Cost Model</strong></h3>
<p>
Our accomplishments in other areas are equally impressive, and equally reflective of HathiTrust&rsquo;s role as a community resource. I hope that all of you are familiar with the work done by OCLC Research and Constance Malpas showing how HathiTrust&rsquo;s collection overlaps with those of our libraries. The first results of that work show a median ARL overlap of 19% in June 2009 and 31% in June 2010. The overlap rate was remarkably constant from big to small ARL. That is, by June 2010, nearly every ARL library could depend on finding approximately 31% of its collection online in HathiTrust. The rate of overlap continued to grow; by June 2011 <em>I estimate</em> the overlap rate to have hit a median of about 45%, and will reach something like 50% overlap early next year. Remarkably, the numbers for non-ARL institutions and particularly the Oberlin Group libraries are even greater. Materials not ingested&mdash;materials from partners like Harvard, Virginia, the CIC and Stanford, and from non-partners like Texas&mdash;could increase that number to more than 75%. The breadth of our holdings is so significant that HathiTrust is being used as one of the key resources for the just announced (Oct. 3, 2011) European serials preservation registry, The Keepers.</p>
<p>
That any one of our libraries could find more than 50% of its collection digitized and online in HathiTrust creates real possibilities, and in this regard HathiTrust&rsquo;s leadership shows vision and commitment. The new cost model, which is based on overlap, is designed to share the burden of archiving in ways that are reflective of the value we derive from the collection. Our institutions share the cost of in-copyright volumes where we hold corresponding print volumes; all members of the partnership share the cost of public domain materials evenly. In order to make that cost model work, we needed a holdings database and are very close to unveiling the first examples of calculations that result from that system.</p>
<p>
Collection overlap is an interesting phenomenon, with the various collections showing both important similarities and important differences. Focusing again on ARL institutions as the exemplar, you&rsquo;ll see in the scatter gram that we look remarkably similar in the rate of our overlap. However, as one might expect, the overlap profile for a collection like Harvard&rsquo;s and a collection like Lafayette&rsquo;s are so different that they will mirror each other, with Harvard holding more print corresponding to HathiTrust volumes uniquely, and Lafayette holding more volumes in common with other institutions, with a smaller number of unique volumes. These are the extremes, but all institutions will have distinctive overlap profiles. Here are just a few examples: [SLIDES]. What this means, then, is that each institution&rsquo;s cost will vary a great deal by size, of course, <em>and</em> by the nature of the collection. We&rsquo;re at a point where I can give you a preview of what that will look like.</p>
<p>
Costs are attributed to three elements of our preservation work: the public domain; in-copyright books; and serials. Keep in mind that all partners share the cost of the public domain equally. As of the end of September, we have 2.6m public domain and 62 partners; thus, the cost of the public domain and open materials comes out to $9,300 per partner. Based on our overlap data to date, the cost for in-copyright books ranges from a low of less than $1,000 per year to a high of about $75,000 per year. I&rsquo;ve masked the institutional names in the data here because it&rsquo;s still a bit early, but these numbers are largely right, and entirely based on holdings data. The high number is Michigan because Michigan&rsquo;s collection is the source for so much digitized content. Institutions with low costs would be institutions like Merced and Lafayette, with smaller collections and sometimes less overlap. Finally, the cost of serials is preliminarily based on holdings at the title level, rather than the volume level. Here are the same institutions arrayed along an X-axis with costs for serials on the Y-axis. The sum of these three costs gives us a low cost of less than $15,000 per year and a high cost of roughly $200,000 per year. Bear in mind that this is a likely reflection of the general shape of 2013 costs, with the bulk of the institutions paying much less than $50,000 per year. As more content comes in, costs go up; as more institutions come on board, costs go down; and as time passes, many elements of cost go down because of declining costs in the technology. So far, this has created a fairly flat picture of cost year-to-year rather than a dramatically increasing cost.</p>
<p>
What I&rsquo;d like to emphasize here is not only a concrete sense of the costs for the partnership&mdash;what they&rsquo;ll be and how we calculate them&mdash;but that we&rsquo;re well down the path to having in place the infrastructure to do this work. That is, we have a collection that represents a broad, common set of needs&mdash;not just public domain works, but in-copyright works that aid us in managing our print collections. We have technology that understands questions of holdings and overlap, which can produce cost calculations and also serve as an access control tool. Although the technology and metadata will benefit from refinement (e.g., our individual serials data could use some work), the partnership now has a good start on something that has tremendous practical value for our institutions individually and collectively.</p>
<p>
At this juncture, and before turning to other accomplishments, I&rsquo;d like to pause to consider one of the bogeymen of the new cost model: some have wondered, &ldquo;what if an institution joins HathiTrust and brings with it one million public domain volumes? Won&rsquo;t that dramatically increase costs in uncontrollable ways?&rdquo; Keep in mind the effect of scale, both of preservation costs <em>and</em> of the number of institutions. The cost for adding one million public domain volumes increases each of our costs under $4,000 per year, with a corresponding benefit of access to a phenomenal amount of content. There&rsquo;s nothing in the e-book marketplace that compares to this.</p>
<h3>
<strong>Publisher relations and publishing work</strong></h3>
<p>
Never once in conceiving HathiTrust did we see this enterprise as being solely about digitized content: we believed that the digitized version of the published record provided an excellent foundation on which to add newly published materials in their original digital formats. To that end, we have set in motion three distinct efforts related to publishing:</p>
<ol>
<li>
Making it possible for rights holders to open access to works.</li>
<li>
Making it possible for publishers to deposit digital master files for archiving and open access.</li>
<li>
Making it possible for publishers to publish directly into HathiTrust.</li>
</ol>
<p>
The second and third initiatives are in their infancy, but all deserve a quick review.</p>
<p>
In the first case, authors and publishers have opened thousands of works in an effort to share them more widely. Several presses, including university presses, and associations like ARL, have already opened substantial bodies of work with no expectation for compensation. They have relied on already extant files in the repository and have granted permissions where possible. Duke University Press recently announced an agreement with HathiTrust and Google, and will apply Creative Commons licenses to its materials, receiving in return digital files (from HathiTrust and with Google&rsquo;s permission).</p>
<p>
Using born-digital materials rather than digitized versions of the books can improve the quality of HathiTrust content and the user experience. One university press is already depositing PDFs of published content. We are in discussions with two academic presses regarding an agreement where, in return for open access to their materials, we will store and provide access to the archival version.</p>
<p>
Finally, the University of Michigan&rsquo;s MPublishing unit is working on a mechanism to publish open access content directly via HathiTrust. By binding together a publishing process informed by archival needs and an access mechanism informed by audience needs, they hope to build a system that makes an archival commitment to readers and libraries without losing the functionality needed for a credible publication. They hope to have the first iteration of this system available next year and to begin sharing their specifications and development process with partners following that.</p>
<h3>
<strong>Uses of in-copyright materials</strong></h3>
<p>
We have made tremendous strides in facilitating lawful uses of in-copyright materials. Particularly in US copyright law, there are clear provisions for uses of in-copyright materials, according to the law&mdash;that is, limitations on the exclusive rights of the owners of copyright. We have legal and moral obligations to our users to provide services for these materials. And there have been important, untested questions that we need to explore as a community. I would like to briefly list work we&rsquo;ve done to support access to in-copyright materials:</p>
<ol>
<li>
We have laid the groundwork for access to in-copyright works by users with print disabilities. Our technology incorporates Shibboleth for inter-institutional authentication, the holdings database as a check of a partner library&rsquo;s purchases, and cooperation with campus offices that provide services to users with print disabilities. We are ready to launch this service, which will provide unparalleled access to millions of works by this small group of users at our institutions. Never before have persons with print disabilities had ready access to libraries of content this large. This will be one of our proudest accomplishments.</li>
<li>
Again using the holdings database and Shibboleth, we will soon be able to provide access to works that meet Section 108 criteria (i.e., that the work is damaged, deteriorating, lost or stolen and is not available on the market at a reasonable price). At the very least, we can make it possible for partners to create print replacements; it is also the case that the DMCA gives us some leeway for digital access to these works. The infrastructure is in place and we will soon use Section 108 provisions in US copyright law to extend access.</li>
<li>
And, famously, we will soon be testing the concept of Fair Use and our ability to serve the imperative of preserving the materials in HathiTrust. How could I give this talk without touching on the suit by the Author&rsquo;s Guild against HathiTrust and several of the partners? Despite the well-documented missteps in our first orphan works identification process, our ability to make these uses under Fair Use and our ability to store the digital copies as part of an overarching preservation strategy are two of the most important principles underlying the HathiTrust effort. The access mechanisms that we have developed (e.g., taking into account holdings of the partner institution and relying on authentication of users) are thoughtful and appropriately conservative. We have taken steps to define lawful uses without antagonism of or disregard for the interests of rights holders. This was an important step for the library community.</li>
</ol>
<h2>
<strong>Big issues</strong></h2>
<p>
Creating a 10 million volume digital repository in and of itself changes the library landscape, and these things I&rsquo;ve just discussed do as well, in that they change our sense of who we are and what we&rsquo;re doing: we have had a positive impact on our institutions, our users and on the profession. Additionally, there are several other developments worth considering as we look back at the last several years.</p>
<ul>
<li>
Our institutions are now <strong>pooling resources</strong> in ways we rarely saw in the past. We have pooled resources to solve the digital archiving problem, to address collection building, to perform collection analysis, and I hope we will soon do so to address print monograph storage issues. We have shifted our investments from funding spent in isolation to common pools of funds to solve common problems. Before someone accuses me of being historically myopic and draws the comparison to WorldCat, keep in mind that in HathiTrust our resource pooling <strong>replaces</strong> (rather than enabling) local work. WorldCat makes it possible to devote resources in our separate institutions more efficiently.</li>
<li>
We have begun to <strong>mobilize resources and expertise from within the various partner institutions to deal with problems common to us all</strong>, such as copyright determination, digitization of government documents, and the refining of bibliographic information. These problems can&rsquo;t all be met by pooling our resources; instead, we must rely upon our individual institutional resources and perspectives. The diversity of our resources and perspectives improves the quality of our work and so makes us all stronger. (Consider the example of the copyright expertise advisory group for the new grant, which has extraordinary talent, and talent that would not be assembled in one place.) The Copyright Review Management System is a good example of early collaboration, and now IMLS has funded us again for a much more ambitious effort to work on copyright determination for publications from around the world. We have used HathiTrust to galvanize the community to address problems collaboratively. If we can find a way to deal efficiently with metadata remediation&mdash;changes and improvements to our bibliographic records&mdash;this too will surely be done by working within various institutional contexts rather than by pooling resources.</li>
<li>
And, finally we have begun to <strong>approach the question of fair use in a large and coordinated way</strong>. For some time, libraries have recognized the need for coordinated action on best practices in order to bolster our use under this part of copyright law. A few of our institutions made bold and solitary moves, and the rest of us have tried to learn from the experience. Working together on this question of fair use does, I believe, position us to develop defensible best practices and establish a clear legal precedent. In the lawsuit brought by the Author&rsquo;s Guild, whether or not we win remains to be seen; that we undertook this work collectively is important and a big change.</li>
</ul>
<p>
In each of these cases, we can see new modes of collective action in libraries. Where it makes sense to pool resources, we do; where it makes sense to work together on common problems, we do; and where we need to act collectively to show a unified front, we do. These are important times.</p>
<h2>
<strong>Connecting the dots</strong></h2>
<p>
Let&rsquo;s pause for a moment to put all of this together.</p>
<ul>
<li>
Together, we have built a collection of nearly unparalleled size and richness. With our future work, it will only grow larger and richer</li>
<li>
We are devoting collective resources to getting a bead on what we actually have here: rights determination is the big example, but we&rsquo;re beginning to see interest in bibliographic remediation, <em>at least</em> for things like government documents.</li>
<li>
We are working to create a record of contemporary publishing <em>within</em> this corpus by working with publishers, and in some cases those publishers that are our libraries, our organizations and our university presses. We are doing that by getting permissions from authors and publishers to open access to materials, by striking deals with presses like the one we just signed with Duke University Press, and, importantly, we will soon be publishing via the repository.</li>
</ul>
<p>
We have charted a path forward for an increasingly comprehensive shared collection, a collection that contains a vast body of open materials, a collection that facilitates lawful uses, and a collection that houses new publishing. This is a collection we can <em>use</em> for many things&mdash;to gain a better understanding of the shape of the published record and our collections, to shape shared storage strategies, to rationalize our collections and to serve our users.</p>
<h2>
<strong>What next for the partnership?</strong></h2>
<p>
The short answer to the question of where the partnership goes next is that it depends entirely on the discussions we will have over the next few days. In 2008 our intention was to get the effort off the ground, and then bring the community together in 2011 to plan next steps with a clearer understanding of what we might accomplish, and that&rsquo;s where we are today. I hope that we leave the Constitutional Convention with a course charted for clearer, more collective governance and strategies for defining future priorities. In the meantime we&mdash;i.e., HathiTrust, our community&mdash;will continue to move HathiTrust forward. We will continue to enhance the systems you see today, providing better full text searching, supporting more functionality through the APIs, and adding more content. The holdings database and the cost model will be fleshed out, and we will all have a better sense of what our costs will be in 2013. These are important things.</p>
<p>
I&rsquo;d like to use this bully pulpit to share my personal opinion, and declare that it&rsquo;s time to beef up the organization. We have made a good start in creating an organization that reflects our collective interests and I feel confident that with the right governance and leadership we can create a stronger HathiTrust <em>without</em> creating a new 501c3 or intensively consolidating staff. To create a large, centralized organization would be to create a HathiTrust divorced from our institutional contexts. This is also an opportunity for me to suggest that it&rsquo;s time for us to look for a full-time executive director. Although I&rsquo;ve enjoyed this work immensely and feel proud of my accomplishments, I believe that a full-time, independent director, a visionary with strong organizational skills, will make it possible for us to build a stronger sense of community and more fruitfully talk to funding agencies, both things that can make HathiTrust all that much more durable. I&rsquo;m not leaving this post today; however, I would like to urge the partnership to strengthen the core of HathiTrust by building a <em>small</em> central staff and hiring a director.</p>
<h2>
<strong>Closing</strong></h2>
<p>
In closing, I&rsquo;d like to return to this theme of the community and working collectively. As we know, so many of the challenges we face are shared challenges. Our metadata are not our metadata in isolation from each other; our collections are not our individual collections in isolation from each other; and many of the baseline services or capabilities we strive to offer are ones that all of us would like to offer in our institutions. The last several years have seen us move markedly in the direction of collective action on collective problems. Indeed, working collectively on collective problems makes it all the more feasible to create distinctive or tailored services for our individual campuses or communities. Whether we call it &ldquo;group scale,&rdquo; as Lorcan Dempsey does, or &ldquo;working globally so that we can better deliver services locally,&rdquo; HathiTrust is a remarkable example of collective action, of our community working together to solve a common problem. Although there are many rough edges and many things to work out, our first steps have been monumentally successful in beginning to change the work we do and the way we do it. This is a tribute to each of you and to your institutions: <em>we</em> did this as a community, and we did it because it made sense. I hope we&rsquo;ll reconvene every few years to ponder where we&rsquo;ve come from and where we go next, and that we will look back on this moment as a powerful example of the changes we can affect for our users and for the profession.</p>
http://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust039s-past-present-and-future#commentsPerspectives from HathitrustMon, 17 Oct 2011 14:27:17 +0000hathitrust744 at http://www.hathitrust.orgHathiTrust and Discoveryhttp://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-and-discovery
<span class='print-link'></span><p class="blog_author">By John Wilkin, Executive Director, HathiTrust</p>
<p>
It is a core tenet of HathiTrust that preservation cannot take place without access. The coupling of preservation and access is both philosophically and strategically central to HathiTrust&rsquo;s mission, as awareness of the materials in our collections helps to create the value that leads to preservation. And because discovery is integral to access, HathiTrust has worked hard on a multi-pronged strategy for discovery.</p>
<!--break--><!--break--><p> Key to this strategy are our ongoing efforts to ensure that HathiTrust content is &ldquo;in the flow&rdquo; of library discovery more generally, as illustrated by our recent agreement to integrate the HathiTrust full text indexes into the Summon discovery service, and our collaboration with OCLC to create a permanent bibliographic catalog for HathiTrust.</p>
<h2 class="p2">
<b>The catalog as a tool for collection management</b></h2>
<p class="p1">
HathiTrust serves two primary constituencies: librarians as collection managers, and scholars and other users of our collections. This may seem like an artificial distinction&mdash;the lines between these two types of users and their discovery methods are often blurred, with bibliographically astute users wanting to look through the lens of the catalog, and reference librarians exhibiting some of the most sophisticated source-intensive research skills. Nevertheless, a central part of the work of libraries, and particularly the partner libraries, is collection management, and HathiTrust has as part of its design (both in its mission and goals) seamless integration into collection management strategies.</p>
<p class="p2">
To best serve librarians as collection managers, a well-designed catalog is a critically important tool. A well-designed catalog for a collection manager always offers bibliographic precision. It allows the librarian to know (and find) exactly what is held, and also how that holding&mdash;that bibliographic instance&mdash;relates to other similar holdings. As we move into large-scale collection management across many of our cooperating libraries, this kind of well-designed catalog will play a critical role.</p>
<p class="p2">
When HathiTrust launched its enterprise, we provided an extremely popular &ldquo;temporary beta&rdquo; catalog based on VuFind. It sported tremendous features like faceted results and the ability to sort results by date and rankings. It was well-received and reliable. At the same time, we announced a partnership with OCLC to build a replacement for this temporary beta, which we expect to launch sometime this year. Why replace the VuFind-based catalog, which works so well? Situating HathiTrust&rsquo;s holdings in the larger OCLC WorldCat database is a tremendous boon to librarians in understanding what we have online, how the collections of the partner institutions relate to each other, and how those online holdings connect to libraries around the world. By managing HathiTrust&rsquo;s records in the same place that other libraries do, we are better positioned to perform collection analysis and to shape future strategies to close gaps. In short, working with OCLC to build the HathiTrust catalog is an important strategy with regard to our collection management goals.</p>
<p class="p2">
But should the creation of an effective catalog with OCLC cause us to abandon other bibliographic discovery strategies? Absolutely not. HathiTrust works in a number of ways to distribute bibliographic information to partners and the world. Our APIs allow libraries to add URLs to their catalogs where their library has a matching record. Our OAI distribution of brief records makes it possible for many libraries and other bibliographically-oriented entities to add records for materials unique to their collections. And the hathifiles, an inventory of HathiTrust holdings now numbering approximately 9 million lines, can help drive institutional processes to identify materials and shape more sophisticated record-oriented strategies. And of course OCLC&rsquo;s efforts to load information about HathiTrust holdings is also a boon for libraries wishing to get records from OCLC. The creation of a catalog is critical, but does not by itself fulfill users&rsquo; needs to find records in other discovery venues.</p>
<h2 class="p2">
<b>Full text discovery and support of scholarship</b></h2>
<p class="p1">
HathiTrust&rsquo;s full text strategy is very similar to its bibliographic discovery strategy, though it flips the paradigm a bit. After an extraordinary research and development effort, HathiTrust launched a full text search service in 2010, and ever since then we&rsquo;ve been working to chart a course for a better, more sophisticated service. This summer (2011), we will launch a new full text search service that will incorporate fuller bibliographic information in the full text, use facets, and offer other features such as weighting of results depending on where the results were found in a text. And of course this will only be one more step in a process of continual enhancement.</p>
<p class="p2">
While HathiTrust believes the catalog function must be in OCLC, where libraries already manage their records, we also insist that the full text service must be in HathiTrust, where the materials are managed. Therefore we will focus increasingly on the standalone HathiTrust full text search service as a vehicle for end-user discovery. As such, it will always work to distinguish itself from the services offered by Google and other commercial services by enabling scholars to search for information precisely and exhaustively. Appealing as it is, Google Search&rsquo;s lack of precision and complete recall can be a hindrance to much scholarly work, and here HathiTrust must step up. After all, our collection of content is different from Google&rsquo;s (with our locally-digitized content and content that comes from partnership with other large-scale digitization initiatives), and our academic orientation ensures that our search results are not influenced by a connection with commerce, such as advertising.</p>
<p class="p2">
Just as our our OCLC strategy does not end our pursuit of other bibliographic discovery strategies, our decision to mount a robust full text search service in HathiTrust does not eliminate the need to ensure discovery elsewhere. Because so much of our content is in Google Book Search and the Internet Archive, we achieve this goal in part without much additional effort. Still, much of the content in HathiTrust is only accessible in HathiTrust, and so getting in the flow of our users&rsquo; discovery methods (particularly users of academic and research library collections) is very important. By making the HathiTrust indexes searchable in Summon, we begin to accomplish this. Although Summon is the first and best of these services, the marketplace will produce others, and we remain committed to ensuring that our content is discoverable in as many of these services as possible. Negotiations are underway with Summon&rsquo;s competitors, and press releases will follow as we conclude these agreements.</p>
<h2 class="p2">
<b>That which can be found is more likely to be preserved</b></h2>
<p class="p1">
In order to effectively support its preservation mission, HathiTrust must constantly improve the discovery experience and must seek to situate discovery wherever our users search for information. &ldquo;Either/or&rdquo; strategies are bound to fail us. Indeed, we will continue to implement a range of discovery strategies in collaboration with all appropriate partners and in every appropriate location. Our strong connection to scholars will lead us to refine the approaches we take to discovery, and our knowledge of where they seek information will guide the approaches we take to distributing records and making our full text indexes available. By making the information we store as discoverable as possible, we stand the greatest possible chance of having that information found, valued and preserved.</p>
http://www.hathitrust.org/blogs/perspectives-from-hathitrust/hathitrust-and-discovery#commentsPerspectives from HathitrustFri, 24 Jun 2011 12:41:33 +0000hathitrust588 at http://www.hathitrust.org