This November, The Rosetta Project was awarded access to staff and facilities at Lawrence Berkeley National Lab to develop a wearable version of the Rosetta Disk. The successful proposal, titled “The Rosetta Disk – An Exploration into Very Long-term Archiving” focused on the need for access to high-powered microscopes and imaging technology available at the Lab to prepare and evaluate components of a new Rosetta Disk prototype. The user program will provide Rosetta Project staff access to the Molecular Foundry, Advanced Light Source, and National Center for Electron Microscopy.

The new version of the Rosetta Disk currently under development uses a similar manufacturing process as the first edition of the Rosetta Disk, with the resulting archive being microscopically formed in nickel and readable with 1000x magnification or less. The main difference is that the final archive is about 2 centimeters in diameter, making it a size that could comfortably be worn on the human body. Given the new process is reliable, fast, and less expensive than the one used for the original Rosetta Disk prototypes, it is the first version of the Disk that could potentially meet the long-desired goal of broad dissemination, in keeping with the long-term archiving strategy of LOCKSS (“lots of copies keeps stuff safe”).

Although the new, smaller size of the disk is an advantage, it imposes a new constraint of having less surface space that the archive contents can occupy. If we keep the information or “pages” in the archive at the size where they can be read with 1000x magnification, we can fit 1000 or fewer of them on the disk. The original Rosetta Disk has over 1,500 languages and 13,000 pages of information, so this means we must include fewer languages, fewer pages for each language, or some combination of the two. Yet constraints breed creativity, and we have chosen to meet this new challenge by slightly altering the contents that will go on the wearable Rosetta Disk.

The contents will be in keeping with the original Rosetta Disk in that they will be represent many of the world’s human languages. The contents will also be parallel, that is, the same information for each language. The two main kinds of content will be a parallel text and parallel vocabulary list. The text we have chosen is the Universal Declaration of Human Rights (“UDHR”), which is available in over 300 languages, and the parallel vocabulary will be Swadesh lists compiled by Long Now’s PanLex Project. The vocabularies will be chosen to match the texts as nearly as we can.

In a major departure from how the contents of the original Rosetta collection were assembled, the Universal Declaration and PanLex data are all “born digital”. This means we have a lot of control over font and font size, but this entails making choices. Our goal will be to maximize the amount of language content on the disk while preserving maximum legibility. This is where access to the Lab microscopes and imaging equipment will be especially helpful.

Another advantage to having “born digital” material is we can make the contents of the wearable Rosetta Disk available as open digital data as well as a physical artifact. We hope this will allow for all kinds of interesting experimentation in the archival longevity of both forms. The Universal Declaration of Human Rights collection we will be using are all available in Unicode, which is a much preferred long-term format, and the PanLex Swadesh lists are now part of the Natural Language Toolkit collection and available as a corpus for computational tinkering.

The 1000x magnification required to read the Rosetta Disk is vastly lower than what is capable with the resources of Lawrence Berkeley Lab, which in addition to a vast array of imaging equipment operates the most powerful microscope in the world (TEAM I, left). Nonetheless, access to higher power equipment will allow us to prepare the content that will go on the disk, evaluate the longevity of the materials we are choosing to use, and to explore new methods to protect the disk surface from environmental damage as well as direct contact with human skin (many people – myself included – are sensitive to nickel).

An aspirational goal of the project is to develop long-term relationships with the staff and scientists at the Lab who have interest in exploring new materials and methods for long-term archiving. Some intriguing new possibilities have already emerged from early discussions (hint: think color!). These may allow us to radically change not only how we archive, but what we are able to archive for the long-term as well. And while new archival technologies are evolving rapidly, what seems steadfast and applicable to all of them are the strategies for long-term archiving long articulated by The Rosetta Project, and both explored and practiced in its Rosetta Disk.

Luis Alberto Cardenas Jorge

Why dont use the new penny size disc that can storage 300tb of info and last 13 billions years?

Michael Kohne

If the objective is broad dissemination (i.e. you’ll be making lots of them), then you can somewhat mitigate the smaller number of pages available by making lots of disks and putting a DIFFERENT selection of languages on each disk. That will allow you to get the larger number of languages into the world, while still using the smaller form factor disk.

I met Dr. Kazansky yesterday. We are looking to experiment with writing some Rosetta data using the technology which is pretty cool, but it is still in development. The data density is entirely plausible but theoretical. The question of decodability would need to be worked out (how do you access what is encoded, and ultimately get meaning out of it?). Also there currently is no reader so that would need to be developed.