About the Archives Unleashed Project

Archives Unleashed aims to make petabytes of historical internet content accessible to scholars and others interested in researching the recent past. Supported by a grant from the Andrew W. Mellon Foundation, we are developing web archive search and data analysis tools to enable scholars, librarians and archivists to access, share, and investigate recent history since the early days of the World Wide Web.

The three-year Archives Unleashed project has three major thrusts: First, the project will build a software toolkit that applies modern big data analytics infrastructure to scholarly analysis of web archives. Second, the toolkit will be deployed in a cloud-based environment that will provide a one-stop portal for scholars to ingest their collections and execute a number of analyses with the click of a mouse. Finally, datathons — will build a cohesive and sustainable user community by bringing the core project team members together with librarians, archivists, and other interested researchers.

Project Team

Principal Investigator

Ian Milligan is associate professor of history at the University of Waterloo. Since 2012, he has been engaged in building tools, infrastructure, and frameworks to facilitate the historical use of web archives. In 2016, he was awarded the Canadian Society for Digital Humanities’ Outstanding Early Career Award.

Co-Principal Investigator

Nick Ruest is the Digital Assets Librarian at York University. He is Co-PI of both the WALK project and the Social Sciences and Humanities Research Council of Canada Insight Grant with Milligan. Ruest is dedicated to building systems to ensure that valuable historical and cultural materials are preserved and made universally accessible. He has been Release Manager for five Islandora releases and four Fedora releases, both of which are open source digital asset management systems. He was previously the Project Director for Islandora CLAW and leader of the Fedora Import-Export Initiative.

Co-Principal Investigator

Jimmy Lin is the David R. Cheriton Chair in the David R. Cheriton School of Computer Science at the University of Waterloo. His research aims to build tools that help users make sense of large amounts of data. He works at the intersection of information retrieval, natural language processing, and databases, with a focus on large-scale distributed algorithms and infrastructure for data analytics.

Project Manager

Samantha Fritz is the project manager for the Archives Unleashed Team. She is an information management professional with a passion for open access, information literacy education, and helping people connect with and make sense of data. After completing her MLIS, Samantha has engaged with researchers who assess the social impact of para sport on perceptions of (dis)ability, as well as intervention models for privacy education among young adults. She is driven to support information and resource dissemination to positively transform the way researchers view, experience, interpret and share information and knowledge. She has worked with organizations such as Ryerson University’s Social Media Lab, the Islandora Foundation, and Dalhousie University Libraries on digitization and data visualization projects.

Postdoctoral Fellow

Ryan Deschamps is a postdoctoral fellow in the University of Waterloo’s Department of History, working under Milligan. He completed his dissertation at the Johnson Shoyama Graduate School of Public Policy (University of Regina) studying the role of social media on policy agendas. His post-doctoral research focuses on the influence of digital information on the interpretation of Canadian historical events. Ryan’s position is funded by the Social Sciences and Humanities Research Council and the David R. Cheriton Chair at the David R. Cheriton School of Computer Science.

Advisory Board

Jefferson Bailey is Director of Web Archiving at Internet Archive. Jefferson joined Internet Archive in Summer 2014 and manages Internet Archive’s web archiving services, including Archive-It, used by over 450 institutions to preserve the web. He also oversees contract domain-scale web archiving services for national libraries and archives around the world, including Library of Congress, NARA, and foreign national libraries. He works closely with partner institutions on technology development, web data research services, educational partnerships, and other programs. He is PI on multiple grants focused on systems interoperability, data-driven research use of web archives, and digital preservation. Prior to Internet Archive, he worked on strategic initiatives, digital collections, and digital preservation at institutions such as Metropolitan New York Library Council, Library of Congress, Brooklyn Public Library, and Frick Art Reference Library and has worked in the archives at NARA, NASA, and Atlantic Records. He is currently Vice Chair of the International Internet Preservation Consortium. He has an MLIS in Archives from University of Pittsburgh and a BA in English from Oberlin College.

Nathalie Casemajor is an Assistant Professor in the Urbanisation Culture Société Research Centre at INRS (Institut national de la recherche scientifique, Montreal). Her work focuses on culture, territories and communities as well as digital culture. She was previously an Assistant Professor at the Université du Québec en Outaouais, a Postdoctoral Fellow at McGill University (Department of Art History and Communications Studies) as well as a Visiting Scholar at the New York University (Department of Media, Culture and Communication).

Robert H. McDonald is the Dean of Libraries at University of Colorado Boulder. His research interests include technology management and integration of lean and agile frameworks, data preservation, learning eco-systems, data cyberinfrastructure, and big data analytics. Robert frequently presents and writes on a variety of topics, and was editor of the E-Content column for EDUCAUSE Review in 2016 – 2017. He is active professionally with a number of national and international organizations and conferences, serving on the HathiTrust Program Steering committee, as the chair for the Digital Preservation Network Heavy Users committee, and as general co-chair for the ACM/IEEE Joint Conference on Digital Libraries in 2013 and 2017.

Matthew Weber is an Associate Professor in the Hubbard School of Journalism and Mass Communication at the University of Minnesota and is the Cowles Endowed Fellow of Media Management. He is Principal Investigator on a National Science Foundation grant that aims to develop new methods and new collaborations for conducting research utilizing Internet Archive data. Weber’s grant works with more than 50 TB of archived Internet data, testing and publishing scripts for transforming archived Internet data into formats that are compatible with existing social science computing packages such as R and SPSS. Weber has related funding from the Democracy Fund, Institute of Library and Information Science and the William T. Grant Foundation.

Michele Weigle is a Professor of Computer Science at Old Dominion University. Her research interests include digital preservation, web science, information visualization, and mobile networking. Since 2012, she has been PI or Co-PI on over $2M in funding for research related to web archiving from NSF, NEH, IMLS, and the Andrew W. Mellon Foundation. Dr. Weigle received her PhD in computer science from the University of North Carolina at Chapel Hill in 2003.

Nicholas Worby is the Government Information and Statistics Librarian as well as the Web Archives Program Coordinator at the University of Toronto. In addition to providing research and instruction support for government information and statistics, he oversees collection development, production crawls and researcher outreach for web archive collections.

To foster respectful collaborations this code of conduct applies to all
Archives Unleashed spaces, includes, but is not limited to, GitHub, Slack,
Medium, social media platforms and meeting spaces, both online and off.

Anyone who violates this code of conduct may be sanctioned or expelled from
these spaces at the discretion of the Archives Unleashed Project Team.

Our Standards

Examples of behavior that contributes to creating a positive environment
include:

Using welcoming and inclusive language

Being respectful of differing viewpoints and experiences

Gracefully accepting constructive criticism

Focusing on what is best for the community

Showing empathy towards other community members

Examples of unacceptable behavior by participants include:

The use of sexualized language or imagery and unwelcome sexual attention or
advances

Trolling, insulting/derogatory comments, and personal or political attacks

Public or private harassment

Publishing others’ private information, such as a physical or electronic
address, without explicit permission

Other conduct which could reasonably be considered inappropriate in a
professional setting

Our Responsibilities

Project maintainers are responsible for clarifying the standards of acceptable
behavior and are expected to take appropriate and fair corrective action in
response to any instances of unacceptable behavior.

Project maintainers have the right and responsibility to remove, edit, or
reject comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to this Code of Conduct, or to ban temporarily or
permanently any contributor for other behaviors that they deem inappropriate,
threatening, offensive, or harmful.

Scope

This Code of Conduct applies both within project spaces and in public spaces
when an individual is representing the project or its community. Examples of
representing a project or community include using an official project e-mail
address, posting via an official social media account, or acting as an appointed
representative at an online or offline event. Representation of a project may be
further defined and clarified by project maintainers.

Enforcement

Instances of abusive, harassing, or otherwise unacceptable behavior may be
reported by contacting the project team at archivesunleashed@gmail.com. All
complaints will be reviewed and investigated and will result in a response that
is deemed necessary and appropriate to the circumstances. The project team is
obligated to maintain confidentiality with regard to the reporter of an incident.
Further details of specific enforcement policies may be posted separately.

Project maintainers who do not follow or enforce the Code of Conduct in good
faith may face temporary or permanent repercussions as determined by other
members of the project’s leadership.

Attribution

Privacy Policy

This Privacy Policy is effective January 2019

We recognize the importance of and are committed to protecting the privacy of all users. The following privacy policy guides and outlines the Archives Unleashed Project’s (AU) online information practices.

The Archives Unleashed Privacy Policy describes what users can expect from Archives Unleashed as to how information is collected, used and shared. This policy applies to all information collected from or submitted to the Archives Unleashed Project, including the Archives Unleashed Cloud portal located at cloud.archivesunleashed.org and all related sub-domains.

By accessing and using the services and products of the Archives Unleashed Project, users accept the practices outlined in the policy below.

Information Collection and Use

We request the minimum amount of personal information necessary for the operation of Archives Unleashed software. We collect several types of information for various purposes to provide and improve our service to you.

Personal Data

While using our service, we may ask you to provide us with certain personally identifiable information that can be used to contact or identify you.

When anyone follows, watches, or contributes to any of the Archives Unleashed GitHub Repositories, we are able to see usernames, but participation is voluntary and we do not collect any information.

Usage Data

We may also collect information how our services are accessed and used. In some case reports are generated by the applications we’ve subscribed to, such as Slack, Mailchimp and GitHub. These reports include general usage statics and descriptive data.

General stats that help us understand how people are interacting with our newsletters:* Number of subscribers* Number of opt-outs* Audience growth* Open/click rates* Campaign performance* Email clients used* Locations (general not specific)

Our public repositories provide insights into the repositories maintained by the Archives Unleashed Project to understand the work being done and who are contributors are.

Tracking & Cookies Data

We use a session cookie to keep you logged in on our service.

Cookies are files with a small amount of data which may include an anonymous unique identifier. They are sent to your browser from a website and stored on your device. The Archives Unleashed Cloud uses a session cookie, which can let you stay logged in between visits to the page.

No personal or identifying information is collected while using cookies, and you can always opt out by changing your browser settings or permanently using a browser plugin. If you do not accept cookies, you may encounter some minor issues when using our service.

Use of Information Collected

In accessing Archives Unleashed tools and services, collected information is used for a variety of purposes:

To allow you to participate in interactive features of our services when you choose to do so, such as our newsletter, Slack channel, or within our GitHub repositories.

To provide user support by responding to inquiries and requests

To provide analysis or valuable information so that we can improve the Service

To monitor the usage of the service

To detect, prevent and address technical issues

Security of Information

We do not sell or provide access of any user information to third-parties. We will NOT share any of your information without consent.

As mentioned before we do not store or have access to your authenticating credentials for GitHub or Twitter. Authentication with these applications are used to authenticate to the Archives Unleashed Cloud. Archive-It credentials are supplied over HTTPS, and are salted and encrypted.

Links To Other Sites

We are strongly committed to serving and participating in open-source communities, which is why you will find that our services reference and link out to other projects and resources that are not operated by us.

When you click on a third-party link, you will be directed to an site outside of the Archives Unleashed Project. We recommend you review the Privacy Policy for those sites to fully understand their information practices. We have no control over and assume no responsibility for the content, privacy policies or practices of any third party sites or services.

Changes to this Privacy Policy

Any updates to our Privacy Policy will be posted to this page. We will also let users know via email and/or a prominent notice, prior to the change becoming effective and update the “effective date” at the top of this Privacy Policy. Changes to this Privacy Policy are effective when they are posted on this page.

Contact Us

General questions or concerns about the Archives Unleashed Privacy Policy should be directed to our Project Manager, Samantha Fritz.

Acknowledgements

We would like to acknowledge that this privacy policy is inspired from the work done by: