Archive-it Partnership. The CDL and the UC Libraries are partnering with Internet Archive’s Archive-It Service. In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage will be transferred to Archive-It. The CDL remains committed to web archiving and is exploring opportunities with Harvard, MIT, Stanford, UCLA, and others to work closely with Archive-It to create an expanded roster of added-value tools and services.

Thank you to the many UC and non-UC archivists and librarians who have partnered with CDL on web archiving activities; we look forward to continuing our collaboration through this next phase of development. CDL would also like to express sincere thanks to and recognition of CDL staff (past and current) who have worked on WAS, including Stephen Abrams, Trisha Cruse, Rondy Epting-Day, Scott Fisher, Erik Hetzner, John Kunze, Rosalie Lack, Tracy Seneca, Marisa Strong, and Ken Weiss.Read the CDLINFO news item to learn more.

Trisha Cruse, UC3 Director, has retired from CDL after 20 years of service in UC including her previous position at UCSD. Trisha was the original director of the Digital Preservation Program and UC3, and she oversaw the development of our services. More information on her career can be found here: http://www.cdlib.org/cdlinfo/2015/01/12/patricia-cruse-cdl. We wish her the best!

WAS Activity, December 2014

112 archives actively collected

1480 total sites collected

3.6 TB of data collected

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

http://www.cdlib.org/cdlinfo/2015/01/15/was-service-update-december-2014/feed/0Announcing a New Partnership: California Digital Library, UC Libraries, and Internet Archive’s Archive-It Servicehttp://www.cdlib.org/cdlinfo/2015/01/14/announcing-a-new-partnership-california-digital-library-uc-libraries-and-internet-archives-archive-it-service/
http://www.cdlib.org/cdlinfo/2015/01/14/announcing-a-new-partnership-california-digital-library-uc-libraries-and-internet-archives-archive-it-service/#commentsWed, 14 Jan 2015 20:31:01 +0000http://www.cdlib.org/cdlinfo/?p=16886More...]]>The CDL and the UC Libraries are partnering with Internet Archive’s Archive-It Service. In the coming year, CDL’s Web Archiving Service (WAS) collections and all core infrastructure activities, i.e., crawling, indexing, search, display, and storage, will be transferred to Archive-It. The CDL remains committed to web archiving as a fundamental component of its mission to support the acquisition, preservation and dissemination of content. This new partnership will allow the CDL to meet its mission and goals more efficiently and effectively and provide a robust solution for our stakeholders.

Why now?

Eight years after the release of WAS, we found ourselves at a critical juncture. The constantly changing and ever-increasing complexity of the web poses significant challenges to the current web archiving toolset and requires frequent upgrades to stay ahead. It became clear that there was a significant opportunity cost to maintaining WAS, which would not leave us with the capacity to develop new added-value web archiving services, such as tools for researchers, computational analysis of aggregated archival corpora, or work toward integrating web archives with other format types.

Collaboration is the Solution

In 2014, the CDL held a series of meetings with peer institutions to investigate the possibility of collaborating on web archiving solutions. We ultimately came to the conclusion that running the core web archiving infrastructure is not the best use of our limited resources. Instead, enlisting the services of Archive-It was the most efficient solution because it will permit the CDL and its partners to reallocate their local resources to activities through which they can uniquely add stakeholder value to the baseline function provided by Archive-It.

Thus, the CDL is currently exploring opportunities with Harvard, MIT, Stanford, UCLA, and others to work closely with Archive-It to create an expanded roster of added-value tools and services. Our goals are to define technical needs as well as the organizational structure that can ensure creation of new tools and services and make them broadly available across the community.

Thank you!

Thank you to the many UC and Non-UC archivists and librarians who have partnered with CDL on web archiving activities – we look forward to continuing our collaboration through this next phase of development.

CDL would also like to express sincere thanks to and recognition of CDL staff (past and current) who have worked on WAS, including Stephen Abrams, Trisha Cruse, Rondy Epting-Day, Scott Fisher, Erik Hetzner, John Kunze, Rosalie Lack, Tracy Seneca, Marisa Strong, Ken Weiss, and Perry Willett.

WAS User Group Meeting. The WAS User Group meeting was held during the Society of American Archivists Annual Meeting (SAA 2014). The meeting was open to all conference attendees for a discussion to explore web archiving issues related to researcher needs. Notes from the discussion are available here: http://bit.ly/WAS2014usergroup.

SAA 2014 Panel. Rosalie chaired a Lightning Talk that focused on access to web archives. The main takeaway was ‘tearing down the silos’. The speakers presented on a variety of methods that they are using to integrate access of their web archives with their existing archival and library collections. For example,

Storage and Infrastructure Upgrades. Work continues on migration of WAS storage to the San Diego Supercomputer Center (SDSC). The project completion date is estimated at Oct/Nov 2014. Work also continues on Rails software upgrades for the Curator and Public interfaces.

New Public Content

University of California, San Francisco Web Archive [Public Archives]

UCSF has created an archive of 75 sites to support their mission to identify, collect, preserve, and maintain rare and unique materials to support research and teaching in the history of the health sciences and the UCSF. The oldest site in the collection is from 2007, when UCSF started capturing websites of the University’s administration, schools, the Graduate Division and the UCSF Medical Center, academic departments, administrative units, organized research units, and student organizations.

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

Storage. Work continues on migration of WAS storage to the San Diego Supercomputer Center (SDSC). We are now moving the content at a rate of approximately 1 TB a day. Once it is all moved (estimated September 2014), then WAS code will be updated to allow for transfer to and retrieval from SDSC. The project completion date is estimated at Oct/Nov 2014.

New partner institution. Lawrence Hall of Science at UC Berkeley has joined WAS!

New public content

Purdue University Web Archive Public Archives

Purdue University Archives has created an archive of around 30 sites to support their mission to make available for research, records and papers of enduring value created or received by the University and its employees. Visit: http://webarchives.cdlib.org/a/Purdue.

WAS Activity, June 2014

117 archives actively collected

1535 total sites collected

2.8 TB of data collected

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

Presentation. Rosalie Lack (WAS Service Manager) will be chairing a lightning talk session at the upcoming Society of American Archivists (SAA) Conference in August, in Washington, DC. The session title is “From Crawling to Walking: Improving Access to Web Archives.” The session will hear from speakers ‘lightning style’ (5 minutes each) about the challenges and solutions to promoting access and discovery of web archives. Speakers include:

John Bence, University Archivist, Emory University

Rick Fitzgerald, Librarian, Library of Congress

Polina Ilieva, Head of Archives & Special Collections, UC San Francisco

Benn Joseph, Manuscript Librarian, Northwestern University

Rosalie Lack, WAS Service Manager, CDL

Anna Perricci, Web Archiving Project Librarian, Columbia University

Meg Tuomala, Electronic Records Archivist, University of North Carolina at Chapel Hill

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

Storage. The technical team started work on migrating WAS storage to San Diego Supercomputer Center. The timeline for completion is summer 2014. Once completed we will be able to realize significant cost savings on storage, which will be passed onto all WAS institutions.

New partner institution. The Food and Agriculture Organization of the United Nations (FAO) has joined us! FAO is headquartered in Rome, Italy. Their first priority will be to archive their over 600 web sites from across the world.

New public content

Three new archives went public: University of Illinois at Urbana-Champaign Web Archives; Environmental Design Archive web site, UC Berkeley; and UCal Web Archive, UC Office of the President.

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

Infrastructure. Work continues on infrastructure upgrades to search and indexing.

New public content

1. University of Illinois Urbana‐Champaign

World Sustainable Development Web Archive. This archive aims to preserve web content published by Non‐Governmental Organizations that focus on environmental and economic sustainability materials from multiple languages and cultural groups. The sites collected have a rich array of documentation, data, images, and media that preserve the diverse perspectives, activities, and practices of sustainability NGO around the world.

3. New York University Libraries / Tamiment Library (Labor & the Left)

Internet/Cyberspace Democracy. Archives the websites of entities advocating universal, equal, and uncensored access to, and use of, online information; those contesting the notion of information as a commodity; those advocating non‐capitalist and non‐hierarchical models of online functioning and governance, etc.

LGBT Rights. Contains the websites of left and militant organizations and individuals who advocate for the rights of the LBGT community.

Notable Individuals. Contains the websites of notable progressive and radical individuals, principally those whose activity is not subsumed under the scope of any of the other Tamiment Library Web Archives.

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

]]>http://www.cdlib.org/cdlinfo/2013/11/25/was-service-update-october-2013/feed/0Eight New Web Archives Go Livehttp://www.cdlib.org/cdlinfo/2013/11/19/eight-new-web-archives-go-live/
http://www.cdlib.org/cdlinfo/2013/11/19/eight-new-web-archives-go-live/#commentsTue, 19 Nov 2013 18:17:06 +0000http://www.cdlib.org/cdlinfo/?p=14546More...]]>There are eight new archives, from three institutions now publicly available in WAS. The archives include a collection of NGO sites from around the world that focus on environmental and economic sustainability, over 50 NGOs that address local environmental conditions in China, as well as the Shanghai Chronicles back to August 2012, and five new archives from NYU of sites related to Internet/Cyberspace Democracy; left in academia, left ideas and theory, and intellectuals; LBGT rights; notable progressive and radical individuals; and websites for progressive policy/educational organizations, foundations, and research institutes.

1) University of Illinois Urbana-Champaign

World Sustainable Development Web Archive. This archive aims to preserve web content published by Non-Governmental Organizations that focus on environmental and economic sustainability materials from multiple languages and cultural groups. The sites collected have a rich array of documentation, data, images, and media that preserve the diverse perspectives, activities, and practices of sustainability NGO around the world.

3) New York University Libraries / Tamiment Library (Labor & the Left)

Internet/Cyberspace Democracy. Archives the websites of entities advocating universal, equal, and uncensored access to, and use of, online information; those contesting the notion of information as a commodity; those advocating non-capitalist and non-hierarchical models of online functioning and governance, etc.

LGBT Rights. Contains the websites of left and militant organizations and individuals who advocate for the rights of the LBGT community.

Notable Individuals. Contains the websites of notable progressive and radical individuals, principally those whose activity is not subsumed under the scope of any of the other Tamiment Library Web Archives.

3. WAS at Society of American Archivists (SAA) New Orleans, August 11 – 17. Rosalie Lack staffed a booth and hosted a WAS user’s group meeting at SAA. Of particular interest, and a clear indication of the growing interest in web archiving, the SAA Web Archiving Roundtable held their first meeting (lean more: http://webarchivingrt.wordpress.com/) and also the only session completely dedicated to web archiving was standing room only! The session was “The Web of sites: Creating Effective Web Archiving Appraisal and Collection Development Policies”; WAS partner institutions (University of Michigan Bentley Historical Library and UC San Francisco) were well represented on the panel. The session was chaired by Nancy Deromedi from University of Michigan Bentley Historical Library; Olga Virakhovskaya also from the Bentley and Rachel Taketa from UC San Francisco presented. Jennifer Wright also provided insight into the practices at the Smithsonian Institutions Archives. Session overview: http://bit.ly/18TB8wg.

4. OS Wayback Development. CDL joins six other IIPC (International Internet Preservation Consortium) members in overseeing the development of Open Source Wayback software. Erik Hetzner, WAS Technical Lead, attended the IIPC-supported meeting in Paris at the Bibliotheque Nationale de France to launch the new effort and plan the development calendar. The OS Wayback is the tool that is used by virtually all libraries/archives doing web archiving (along with WAS). It recreates archived websites, including content, images and navigation so that end users can view and interact with archived sites. More details about this IIPC project.

New Partner: Smith College

New Public Content: Current Events in China: popular websites, blogs & twitters

WAS Service Description
The Web Archiving Service (WAS) enables librarians, archivists and researchers to capture, curate and preserve websites and web‐published materials. WAS makes it easy to build web archives, with scheduling and other tools to help manage your archive. You control public access to your archives and can configure the appearance and navigation of each archive. We also provide collection development consultation and help desk support for web archiving questions.

Newsnet, Commentary, Blog, and Twitter are the major channels for over 500 million Chinese to access news, voice opinions, and exchange ideas on current important domestic and international issues, although the exchanges are ephemeral and in flux, they, however, play a vital role in people’s daily life. The CDL’s Web Archiving Service (WAS) provides a prefect platform to capture and preserve ephemeral content to fill the gap of collections and services. The “Current Events in China: popular websites, blogs & twitters” is an attempt to do just that. The sites are identified by Professor James Tong of Politic Science at UCLA and Professor Yong Hu of Journalism at Peking University, and the capture started in September 2012. Users can now access, due to a WAS six-month embargo on archived sites, the contents from Feb. 2013 and prior.