Heritage BytesNews and Updates about Open Context2017-09-07T01:15:43Zhttp://ux.opencontext.org/feed/atom/WordPresshttp://ux.opencontext.org/wp-content/uploads/2015/10/cropped-oc-blue-square-logo-32x32.pngskansahttp://ux.opencontext.org/?p=17322017-06-07T03:19:45Z2017-06-07T00:28:46ZThe most recent data publication in Open Context features 3D models of archaeological features and objects. The Gabii Project features digital content from excavations at the ancient Latin city of Gabii, a neighboring and rival city-state to Rome in the 1st millennium BCE. Rachel Opitz, a collaborator on the project, which is led by Nicola Terrenato and the University of Michigan, shared the news on Facebook and described how this data publication is situated in the Gabii project’s goals:

“Earlier this year the Gabii Project published its first digital volume, A mid-Republican House from Gabii with the University of Michigan Press. The volume integrates text, interactive 3D content, and the data and media collected during excavation. The data and media are made available open access on the publication website and is connected directly to the publication, creating a rich digital resource for the study of Gabii. We’re now publishing the same data with Open Context. Why reproduce the published data? One of our aims is to make our data more discoverable. By including our data in Open Context, we hope researchers undertaking comparative and thematic studies will readily find our data. Second, because we know ‘lots of copies keeps stuff safe’ and we see reproducing our data across multiple archives as a way to ensure it will remain available. Finally, the Open Context collection represents our first foray into the world of linked open data. Gabii’s database is a relational database in part because we feel this architecture well supports our current data collection workflow, and in part because linked open data wasn’t on our radar in 2009 when the recording system was designed nor in 2011 when we moved to a web-based platform. We’re excited to be moving our published data out into the wider world of LOD and seeing how this format might facilitate further research using our data.”

The Open Context data publishing process involves not only organizing and validating datasets, but also annotating data with relevant Linked Open Data concepts. This annotation makes the data more accessible and intelligible because it can be discovered and linked to related content from across the web. For example, some of the Gabii content is related to content from the nearby Etruscan site of Poggio Civitate (Murlo). The Open Context team works with data authors to relate terminologies in a given dataset to more widely shared standards. We linked some object types at both Poggio Civitate and Gabii to the Getty Art and Architecture Thesaurus (AAT), a controlled vocabulary widely used by museums and other cultural institutions. The examples below show how reference to the AAT promotes discovery of related comparative materials from these two excavations:

Bucchero: This is a specific type of dark and glossy pottery common to Etruscan ceramic production.

The Gabii data publication is the first project to display 3D models in Open Context. Excavations increasingly are documenting their work in 3D, so Technology Director Eric Kansa collaborated with Rachel Opitz (an archaeologist with the Gabii team and an expert in photogrammetry, GIS, and other forms of digital documentation) to make the Gabii 3D models usable through common web-browsers. Open Context integrated the open-source X3DOM framework to render these models, so users can interact with them with a mouse or touch screen without needing to install any additional plugins or software. They are best accessed with fast internet connections because the models have large file sizes. Finally, use of widely supported open standards for the models should help improve longevity of the data. Check out a few examples from the Gabii Project:

]]>0skansahttp://ux.opencontext.org/?p=17092017-05-04T03:49:53Z2017-05-03T16:37:04ZAs we announced a few weeks ago, we have launched a new project aimed at building a corpus of osteometric data from zooarchaeological assemblages in the Levant region. This post provides more details about how the project will work and options for submitting data.

First, an Update on the Project Name!

Due to the enthusiastic response we’ve received, including from people working in adjacent regions, we are changing the name of the project to the Biometrical Database of Near East and East Mediterranean Fauna. This title more inclusive and allows room for logical growth. However, as planned, we’ll prioritize data from the Levant and adjacent regions and expand from there.

Your Data Publishing Options

Option 1: Contribute your (specimen-level) data to the overarching biometrical database project.
The biometrical database will be a single project (called a “data publication”) in Open Context. It will be called the “Biometrical Database of Near East and East Mediterranean Fauna” project. You can see it as working like an edited volume, where each of you is a contributing author to the volume (like a chapter), and Sarah and Justin are the co-editors. The project will have a project description page in Open Context, and all contributed data will be added to this one project. The “sub-project” page will include important background information about your dataset (see below).
A good example of this is the in-progress Archaeology of Mesoamerican Animals project, which has separate sub-projects for each chapter.

How to contribute data for Option 1: For this option, we request only specimens for which you have taken von den Driesch measurements. For those specimens, we need the following data from you, ideally in a spreadsheet (or CSV “dumps” of tables from a relational database): site name, period, date range (in cal. BCE), context information (area, locus, general context description), analyst name, unique specimen number (“ID #”), taxon, element, proximal fusion, distal fusion, and all von den Driesch measurements (in mm). Here are some sample spreadsheets we created to help guide you.

Open Context has very flexible tools for importing data organized in very different table structures. We mainly encounter difficulties if data are “lumped” together in “comments” fields. For example measurements coded as “Dp: 20.1; Bp: 22.2” in a single cell are too time-consuming and error prone for us to import. If you’re unsure about your formatting, please share your spreadsheet with us and we can work with you on how to format it to easily import to Open Context. We also ask that you send an overview of the site the bones came from, with information about the excavation and any methodological information you’d like to include. We work with you over email to include all the appropriate metadata for the project. See an example of this type of project documentation.Please note, we also welcome supporting digital images of drawings or photographs and in some cases even 3D models. If you chose to contribute such media, make sure the media resources can be clearly and unambiguously associated with specific bone specimen.

Option 2: Publish your own full dataset as a separate project in Open Context and also link it to the biometrical database.
Some of you may have entire datasets that you want to share via Open Context (including specimens beyond those with von den Driesch measurements). That is, you don’t want to go through your data and pull out only the relevant data we request for the biometrical database in Option 1. You’d rather share your full dataset. In this case, you can contact Sarah about publishing your dataset as a stand-alone “data publication” in Open Context, and then having the relevant metrical data from your data publication linked up with the biometrical database project. This is a good option for people who would like to publish and archive a full project, perhaps related to a conventional publication you’re preparing. The data would still be interoperable with the biometrical database project described in Option 1 above, but would be in its own distinct data publication because the data include more information than requested for the biometrical database.

In this example, Max Price has published his full dataset, but certain specimens (those with measurements) will work with the biometrical database project. Open Context will automatically include citation information for Max Price’s data, along with data contributors, for users of biometrical data.

A few additional notes:

Please Send Specimen Records only, not “Aggregated Data”
We publish records documenting individual bone specimens. Aggregate bone data (lumped by site, phase, context, taxon, or element) is not useful for data integration. That is, rather than ranges of measurements; we work with the specific individual measurements per specimen. Publication of individual specimen records provides much more analytic flexibility and offers more potential to support new research.

Copyright Permissions
In keeping with “best practices” for scientific data publishing and archiving, Open Context publishes content under a Creative Commons Attribution or Public Domain license (as you choose). Open Context claims no ownership and requires no transfer of copyright to publish. Creative Commons licenses gives Open Context permission to publish and archive data. Creative Commons licenses also grant reuse permissions to other individuals and programs, provided those reuses properly attribute data contributors with clear citations.

Citation and Attribution for Contributors
Open Context issues DOIs (library backed identifiers commonly used by journals) for each project (analogous to a journal article or book) and for each sub-project (analogous to a chapter in an edited volume) contribution. You will always be clearly identified as the author of any data you contribute. This pertains to your entire dataset, and also to every single specimen you contribute.

Preservation and Archiving
Open Context issues DOIs (library backed identifiers commonly used by journals) for each project (analogous to a journal article or book) and for each sub-project (analogous to a chapter in an edited volume) contribution. You will always be clearly identified as the author of any data you contribute. This pertains to your entire dataset, and also to each individual specimen you contribute.

Thank you! To get started or to ask questions, please email us:
Justin Lev-Tov and Sarah Whitcher Kansa
(jlevtov AT yahoo.com) (sarahkansa AT gmail.com)

]]>0Eric Kansahttp://opencontext.orghttp://ux.opencontext.org/?p=16642017-04-21T06:32:24Z2017-04-20T15:35:10ZEndangered Data Week highlights the urgent need to protect public records. Our ongoing collaboration with the Digital Index of North American Archaeology (DINAA) project provides a specific example of why public records matter.

Before we discuss DINAA in detail, first we need to provide some context. The United States has enacted a variety of laws and policies that govern access to public records and the historical and cultural heritage of our Nation. The National Historic Preservation Act of 1966, which begins with “The preservation of our heritage is in the public interest… as a living part of our community life, in order to give a sense of orientation to the American people,” represents one of the key pieces of legislation protecting archaeological and historical heritage in the US.

The National Historical Preservation Act led to a number of administrative processes and offices that regulate how federal actions, especially in regard to construction and other development, impact historical and archaeological sites. Many of these processes work at the level of state governments through State Historic Preservation Offices (SHPOs) and through intergovernmental consultation withTribal Historic Preservation Offices (THPOs). Conservatively, the public invests over $500 million per year to comply with historical and archaeological protection measures required by federal law (Altschul and Patterson 2010). This level of public investment nearly matches the total combined budgets of the IMLS (roughly $240 million in 2015), the NEH (roughly $140 million in 2015), and the NEA (roughly $150 million in 2015), all now threatened with total elimination. These surprising numbers demonstrate archaeology’s relative importance in public cultural heritage investments.

Unfortunately, much of this work and investment goes largely unnoticed. Up to now, decades of investment in managing and protecting America’s archaeological heritage, have led to few publicly accessible impacts. Cultural resource management (CRM) largely takes place within relatively opaque bureaucratic processes that regulate construction and development. Some of the secrecy around information created by CRM work is motivated by threats of looting and vandalism, as well as cultural sensitivities related to some archaeological information, especially that with religious significance to modern Native American communities. But, more routinely, CRM work typically takes place via commercial contracts, and many of these contracts place rights restrictions on documentation. CRM work has resulted in an estimated 350,000 reports nationwide as of 2004 (NADB 2004), but because no instrument for cataloging or public access was ever mandated, irreplaceable cultural heritage documentation in these “grey literature” reports languishes, ignored and underappreciated.

Many SHPO offices struggle with little funding and few information technology supports. Moreover, a hostile political climate now sees historical preservation as little more than a regulatory compliance burden. The dedication, professionalism, and commitment these offices play in protecting and documenting America’s historical landscape often goes unnoticed. Thus, obscurity may ultimately undermine the whole point of historical preservation laws.

Why efforts like DINAA have strategic significance

As part of their “behind the scenes” administrative work, SHPOs create and manage inventories of archaeological and historical sites discovered and documented by research mandated by historic protection laws. These inventories reside in electronic databases of various types and formats. Because so many of this information is now digital, it is much more feasible to publicly share and make use of these data — data developed by many experts working for many years to amass.

DINAA, led by David G. Anderson, Joshua Wells, Stephen Yerka, and the Open Context team (Eric Kansa, Sarah Whitcher Kansa), works to aggregate, publish, and archive inventories of historical and archaeological sites. To do so, DINAA collaborates with state government officials and tribal nations across the United States to properly prepare the data (including redacting ownership, location, and culturally-sensitive information, for legal and ethical reasons) and make it accessible to a broad audience. This includes public map visualizations showing site distributions at roughly a 20 x 20 km spatial resolution, a scale excellent for regional and continental displays while still protecting site security. For the same reason, DINAA maintains no sensitive data online; we keep primary data files in offline, encrypted storage.

We have partnerships in place or in development with about two dozen states at present, and already have integrated information from half a million sites across much of the eastern United States. Public funding from the National Science Foundation (Grants 1623621, 1623644) and the Institute of Museum and Library Services (LG-70-16-0056-16) makes DINAA possible, and in the spirit of DINAA’s public support, anyone can access and use the entire DINAA dataset without any technical or intellectual property restrictions. Increasing numbers of state, tribal, and national professional and resource management groups and organizations endorse and support these efforts.

DINAA is hosted by Open Context, a data publishing platform referenced by both the National Science Foundation and the National Endowment for the Humanities for data management in archaeology. Open Context archives DINAA data with the University of California’s central digital repository and other partners internationally. DINAA brings these site file records together in order to link and index a wide range of reports, publications, museum collections and research databases. It makes accessible, for the first time, data from the millions of archaeological sites and tens of thousands of reports produced by archaeologists over the past two centuries, much of it in recent decades.

By making rich cultural data publicly available, DINAA’s activities help our nation realize the original intent behind historical protection laws. For a large fraction of the United States, DINAA now offers the closest we’ll probably ever come to a comprehensive “census” of America’s population history over the past 14,000+ years. This is inherently interesting and significant because it helps document historically-unique experiences and richly diverse cultural development of peoples in many different societies. By making the data public, visible, and usable (without IP or other restrictions), we can now explore new cross-disciplinary research questions about how people have interacted with their natural environment over vast regions and time horizons. This can give us important new insights into sustainability. Already, zoologists, biogeographers and other environmental researchers have started to integrate DINAA into their own information management systems.

At the same time, Native American THPO officials can gain better access to information about their ancestral territories. Many Native American nations we forcibly removed from their historic homelands, and these territories often span multiple modern state boundaries. Access to information from across these territories can help under-staffed and under-supported THPO offices protect and manage their historical and sacred landscapes, and facilitate government to government consultation.

Furthermore, these data can help publicly highlight threats to America’s cultural landscape. Our research has been documenting the potential impact of climate change and related coastal flooding, which threatens to destroy tens of thousands of archaeological sites, along with their present communities and habitats. In this respect, efforts for better digital preservation and access complement stewardship of the physical record of the past. Publicizing the scope and scale of these threats will be a critical element in building public support for conservation measures.

Projected impacts of climate change-related coastal flooding, developed by Stephen Yerka with the DINAA dataset

Finally, in working to make these data go public, DINAA raises new possibilities for wider community collaboration on the curation of these data. Can we create more culturally inclusive systems of information governance so that classification systems, understandings of cultural appropriateness, and accessibility protocols better meet the needs of Native American communities and tribal nations? How do we improve data quality and organization to better meet the needs of scientists? Can we use such data to protect the archaeological and historical past though more democratic and less bureaucratic mechanisms?

These questions highlight the integral role that public records, including cultural data, should play in forging a more equitable civil society.

References

Altschul and Patterson 2010 Trends in Employment and Training in American Archaeology. In Voices in American Archaeology. Edited by Wendy Ashmore, Dorothy T. Lippert, and Barbara J. Mills ISBN No. 978-0-932839-39-8

]]>0skansahttp://ux.opencontext.org/?p=16562017-04-20T18:27:03Z2017-04-17T12:40:22ZFor the week of April 17-21, we’re joining a large community-wide effort to raise greater awareness of “endangered data”. In light of all of the other crises in the world, highlighting endangered data may seem silly. After all, given the daily news onslaught of increasing authoritarianism, kleptocracy, war, bigotry, poverty and environmental problems, the fate of abstract electronic databases seems low on the priority list.

However, we argue that safeguarding data represents a need to safeguard our civil liberties, civil society, future environment, and broader understanding of our world. This last point is key. Data are often integral to how we try to understand the world.

As authoritarianism takes hold, data become increasingly politicized and precarious. Authoritarians attempt to dictate what is and is not true. Truth must conform to the needs of vested interests and ideologies or it will be suppressed. The current administration’s assault on climate science represents a stunning assault on an “Inconvenient Truth” (so aptly named by Al Gore). Beyond climate science, researchers create data key to understanding social, historical, and governance issues. Like climate science, better understanding in these other domains can threaten powerful and entrenched interests, which is why authoritarians may seek to suppress or corrupt data documenting such topics.

Unfortunately, we don’t really understand the full scope and magnitude of what data may be under threat. We also don’t have a good understanding of what threats may be more immediate and where to prioritize our “data rescue” efforts. But here are some (incomplete) thoughts about what threatens data:

Outright Suppression: Some datasets may be suppressed and destroyed overtly. This is a digital equivalent of burning books or even whole libraries.

Lack of Time: People need time to dedicate their attention to work on data. Badly structured rewards, incentive systems, and other bureaucratic pressures in academic research, force many researchers to neglect data. Researchers need intellectual freedom to devote their time toward data, where the rewards are still uneven and uncertain.

Lack of Access: Hiding data away from wider scrutiny makes it easier to delete, alter or corrupt. It also makes it easier to make spurious claims (and harder to refute them).

Analytical Biases: Data need analysis to be interpreted and used. People apply different models and analytic methods that may (or may not) explicitly or implicitly bias understanding of data.

Filter Biases: The past several months have provided a hard education on the problem of “fake news” (propaganda) in the contemporary news media. Even if we manage to preserve some integrity in our data and analyses, we face the steep challenge of communicating our understandings in an overtly hostile and ideologically-charged media environment.

In arguing for the importance of data, we’re not suggesting that data are wholly objective or empirical. Data are never complete, perfect, or absolutely objective. As brilliantly discussed by Cathy O’Neal, data reflect our incomplete and often biased views of the world. Because data, like other forms of knowledge, are imperfect, they need to be a part of open conversations and debates in civil society. If we do a better job at making data more open to critique and evaluation from people with a wider variety of perspectives, we can improve both the data themselves and our understandings derived from them.

Over the past several months, we have taken part in “data rescue” events organized across the nation. There is a strong focus on climate data but our participation involved endangered data from National Park Service websites. Working with Max Ogden and colleagues at the California Digital Library, we safeguarded more than a terabyte of data from a National Park Service database, as well as some 20,000 web pages, especially those that bring US national parks to underrepresented communities (African American, Asian American, Native American, LGBTQ).

As we move forward with Endangered Data Week, we will post more about the needs to protect public data, some of the importance of public data for a healthy civil society, and some of our broader collaborations to make public data better protected and understood.

(Updated 2017-04-19 by E. Kansa to fix typos, add links)

]]>0skansahttp://ux.opencontext.org/?p=16482017-04-13T16:14:20Z2017-04-13T16:14:20ZWe are happy to announce the kick-off of a large-scale data integration project, provisionally titled The Biometrical Database of Levantine Fauna. This project’s goal is to build up a massive body of openly-available zooarchaeological data from the Levant, with a specific focus on measurement data, in order to facilitate and improve research and instruction worldwide. This project represents a collaboration among many colleagues located across the globe, who recognize the research and teaching potential of access to large databases of related content. Zooarchaeology is particularly amenable to data sharing because practitioners collect large quantities of data in somewhat more “standardized” formats than seen in other archaeological sub-disciplines. The data will be published in Open Context, where it will be available openly for download and reuse, and linked to related content both in Open Context and across the Web. All data contributors will be clearly cited, both for the overall project and for each individual specimen.

This project is open to anyone collecting primary zooarchaeological (or related) data from sites of any period in the Levant and adjacent areas. Its success relies on broad participation from the zooarchaeology community. If this resource sounds exciting to you for your research, please contribute to it! We are currently reaching out to gauge the level of participation. In late spring 2017, we will send details to interested participants of the kind of data we would like you to submit, as well as instructions on how to prepare datasets for publication with Open Context. This will be an ongoing project, so please get in touch with us if you are interested in participating now or in the future! Please contact Justin Lev-Tov (Project Manager) or Sarah Whitcher Kansa (Open Context Editor) for more information.

Dr. Rowe, a professor at the University of Texas – Rio Grande Valley, developed the Virtual Valdivia data to become a central repository for ceramic forms from this cultural tradition. At present it contains over 400 records of ceramics from the site of Buen Sucesco from phases VI and VII. Each record within the database contains a wealth of information about the context and different attributes of the sherd. An example of these records can be seen here.

The Virtual Valdivia Project demonstrates how data involves much more than information sharing. Rowe not only published her data with Open Context, but thanks to the professional development opportunities offered by the #MSUDAI, she learned the fundamentals of Web development and programming with Javascript. These skills enabled Rowe to display dynamic “feeds” of data drawn from Open Context on her own web page. On her own web page, she customized the presentation of these data, including Spanish language translations. In effect, this shows how data-sharing can not only open new research opportunities, but it can also provide new ways to communicate archaeology globally, with multilingual audiences.

As Dr. Rowe notes, ceramic comparanda are often difficult to access due to barriers of language, publication distribution, or gray literature. This digital database addresses these issues, containing bilingual English-Spanish project descriptions, images, and a wide range of data on individual ceramic sherds. The goal of the project to become a repository for data from many Valdivia sites will help address regional questions of ceramic tradition, variation and social practice.

In completing this demonstration project, Rowe gained familiarity with key technologies and best-practices and will be able to incorporate this knowledge into her own teaching. Her project highlights how digital technologies are not only useful tools to solve specific research queries, but can impact future research design and engagement with archaeological data.

Open Context Project Spotlight

by Hannah Lau

Our first project spotlight is the Oracle Bones in East Asiaproject, by Katherine Brunson, Zhipeng Li and Rowan Flad. The project is a collaboration between researchers at Brown University, the Institute of Archaeology, Chinese Academy of Social Sciences, and Harvard University. Its goal is to create a comprehensive dataset of oracle bones—animal bone artifacts found at archaeological sites spanning the Neolithic and Bronze Ages—that can be used to trace the origins and spread of oracle bone divination rituals in East Asia. The project particularly focuses on uninscribed oracle bones, which have been less systematically published than their inscribed counterparts.

Oracle Bones in East Asia project member Katherine Brunson was recently awarded a grant from the Esherick-Ye Family Foundation to undertake summer fieldwork to analyze, measure and record oracle bones in archaeological collections in China. As the database grows, it will be possible to examine broad spatial and temporal trends in the use of oracle bones in ancient China.

The Oracle Bones in East Asia project exemplifies the many benefits of data sharing! One goal of the project is to make data accessible in multiple languages that otherwise could only be accessed in person. An example can be seen at this link, where the project team members have clearly defined, in English and Chinese, all the zones they refer to on each bone. This thorough and clear documentation points the way for how archaeologists can work from the bottom up to develop common recording systems that enable broad comparisons across projects. That is, if researchers propose and publish clear “standards” that they have found to be useful in their work, and clearly demonstrate how they used these standards, others will adopt them and this will build a body of comparable data. Additionally, publishing oracle bone data and data collection protocols together in an open access format will encourage scholars from around the globe to use the database for their own research on oracle bones and to contribute specimens from their own collections.

Project Spotlights showcase data publications in Open Context that have unique features or exemplary documentation. We will spotlight projects every few weeks to highlight the diversity of data publications and the creative work being shared by authors.

The digital version of the volume includes hundreds of links to the archaeological data produced during the pedestrian survey of 2003-2011, published on the web in Open Context. The dataset in Open Context includes survey units, objects, typologies, phases, and images, all with their own unique and stable URI and citation information. In his introduction to the digital version, Caraher explains that this integration of the original volume with the digital content means that “the reader can now “drill down” into the data through hyperlinked text in a pdf version of the book,” allowing them to “view the various digital archaeological objects that form the basis for the arguments advanced in this book.”

Caraher emphasizes the provisional status of the digital book, as it is an attempt at the “retrofitting of a traditional, analogue text with a layer (literally as well as figuratively) of links to our published digital material.” The PKAP volume is one of many diverse approaches to integrating conventional (print) publication with related web resources that cannot be accommodated by conventional publication formats. Moving forward, new publications should consider how they can construct their narratives to seamlessly integrate with supporting relevant data and digital content located across the web. Just as importantly, managers of web resources need to consider how they can best build their content and services to support and integrate with synthetic publications.

]]>0Eric Kansahttp://opencontext.orghttp://ux.opencontext.org/?p=15072017-02-14T17:45:17Z2017-02-13T21:13:44ZThis week is “Love Your Data Week“. The event organizers hope it will raise awareness for the need to better curate research data in order to encourage more collaboration, transparency, and reproducibility.

However in the US, “Love your data week” comes during a major political crisis that threatens all of our data. Already, the Trump administration has altered (redacted) educational and scientific information related to climate change.

Motivated by this threat, a grassroots “Data Rescue” movement has quickly organized researchers, librarians, software developers, and the interested public. This movement is in a race against time to find, retrieve and archive threatened Federal information before it gets corrupted or destroyed.

Much of the Data Rescue efforts have understandably focused on climate change and other environmental data. However, these represent the tip of the iceberg in terms of need. The Federal government also creates research and educational information relevant to many other social, cultural and historical topics.

Data Rescue for Culture, History, and Social Sciences

For past several weeks, our team here at Open Context has run Web crawlers and other software to archive some of the “long tail” of Federal information. For example, we’ve directed much of our focus on Web resources created by the National Park Service (NPS). It is through the national parks that many Americans (and international visitors) learn about America’s rich and diverse natural and cultural history. The NPS provides vital educational information describing and documenting that history, including the information about the experiences of historically-underrepresented communities.

We worked under the guidance of Jolene Smith and Kate Ellenberger, both experts in public archaeology and history. They prioritized and documented lists of Web resources likely to be threatened by the new Administration. We used these lists to seed a “quick and dirty” (this is not a type of software we have much experience with) Web crawler that downloaded Web pages, submitted pages to the Internet Archive’s Wayback Machine (the world’s leading repository of Websites), and then repeated the process with new links discovered in the archived pages. Kate and Jolene also manually downloaded hundreds of resources that the Wayback Machine could not reach.

We’ve been running multiple machines day and night and have successfully archived thousands of Web resources from the National Park Service and other Federal Agencies. Here are just two examples that we saved:

Civil Society and Protecting Knowledge

This weekend, we started to scale up and more broadly coordinate our Data Rescue efforts by participating in the Data Rescue- San Francisco Bay Area event hosted at UC Berkeley. Here’s a picture of a room full of software developers that coded crawlers to archive a broad array of data from the Department of Energy, NASA, and other agencies.

The picture illustrates a tremendous groundswell of coding talent volunteering their time and software expertise to saving our nation’s scientific knowledge. A committed and engaged public truly “loves their data” and recognizes how data plays a key role in understanding.

A Call for Support: We Need Human Expertise

Our experience with Data Rescue highlights the key role role that civil society must play in order to ensure the long term survival of knowledge in a digital world. These volunteer coders would not have accomplished much if nonprofit institutions like university libraries and the Internet Archive did not exist. Similarly, our work with Open Context and other Alexandria Archive Institute projects has helped prepare us with the skills, professional networks, and capacity needed to quickly respond to this crisis.

Our job now is to expand our capacity, and especially our capacity to prioritize and document the content that needs to be saved. Jolene Smith and Kate Ellenberger demonstrated the clear need for human expertise to more effectively direct our Data Rescue efforts. Learning from this, we need to hire human experts, particularly graduate students and other researchers and educators with deep domain knowledge about the US government’s role in educating the public about US history, archaeology and culture.

Update Note:

We updated this post after receiving permission from Kate Ellenberger to publicly acknowledge and recognize her for her tremendous efforts and guidance.

The team hacked this data viewer together over a weekend as a proof-of-concept. In the typical spirit of the digital humanities and digital archaeology, they developed a playful approach exploring the materials using the potential of the HTC Vive sdk to ingest Open Context data as json, and then to place it into a relative 3d space. We particularly appreciated their candour and self-assessment of what worked, and didn’t work about their project, and their plans for the future. We look forward to seeing their work progress, and hope that this prize will help them move forward. Please explore their project at https://vrcheology.github.io/ .

Congratulations to the team, and thank you to all who participated. Please keep your eyes peeled for next year’s edition of the prize!