b. Several studies note that as technology develops, new value can be assigned to
records; this is particularly true with Cloud services. For example, Instagram is
used as both a "storage box" of personal photos and a space to share information
about users' identity and activities.12 Should the archival management
system capture and preserve the profile in place at the moment of creation or
transmission of each record? Additional complexities arise when new people
enter the picture. The collaborative nature of social media platforms encourages
the creation of new records (or new representations of existing records) via
linkages, embedded content and comments. "Likes," tags, and participation by
others on photos add new value to those possessions, but such metadata can
easily become obscured in the interface, if not trapped in the application where
it is recorded. The additional information added by others might be considered
as context-of-creation metadata (in the case of collaborative environments such
as Google Drive) or context-of-use metadata, such as "likes" and "shares" in a
social media platform. Both forms of context suggest that archival systems will
need a method to represent the role that a particular user played in modifying or
adding to the core record, that is to say, the original "creation" developed by the
original "author," "creator," or "collector" of a particular work (Bak, Hill, 2015,
p. 101-161). Archival descriptive records might somehow catch and fix these new
associations as some representation of provenance.13
Context is and has always been a fluid entity in time, that is, it changes as time
passes by. What is new today is that context has become a fluid entity in space, that
is, it changes as we look at it from a different perspective. For example, a document
stored in Google Drive or a similar Cloud-storage service may be represented as
belonging to one folder for the original creator and a different folder for a
contributor provided permission to update the document. Given the collaborative
nature of these tools, it appears that in general the same document belongs to
different folders according to the agent - be it an individual or a system - that
interacts with the document.14 Similarly, social media postings appear at a
particular point in a stream of posts. The specific stream is produced by the
interaction of object metadata with user preferences and choices, and these of
course vary for different users at different times; as users comment on or annotate
that record, evidence about its use accrues alongside the original post. The
consequential question is whether the standards and tools available to archivists will
allow them to preserve both the records and the complex relationships reflecting
their creation and use, which represent a major part of their context. A preliminary
question should be whether archivists agree that such network of relationships
needs to be preserved. If so, what can be done to help them implement a cohesive set
of archival services that are suitable to the Cloud-based environment in which many
people live their digital lives? Should archivists stick to a static, single perspective
framing data and metadata once it crosses the archival threshold, or should they
adopt a more flexible approach where different perspectives may coexist? What
metadata should be retained? For what purposes?
Furthermore, how much metadata is enough? In the digital environment, metadata
associated with or embedded into records may provide relevant information on the
provenance of either the records themselves or the systems in which they reside.
However, if the scope of provenance is broadened to include societal provenance,15
the list of sources where to get metadata needs to be extended to include materials
documenting aspects of both the society at large and the specific communities in
which the records have been created, managed and used.
Linked Data
The most promising model for describing digital resources is RDF (Resource
Description Framework).16 Its very simple design is based on the notion of a triple,
that is, a statement consisting of a subject, a predicate, and an object, describing
some elemental aspects of a resource. RDF is a fundamental component of the
Semantic Web architecture, since it allows - along with other Web technologies - to
publish and interlink structured data that can support semantic queries, i.e., queries
that enable the retrieval of both explicit and implicit information.17 Data published
on the Web according to this architecture are called Linked Data.18 Ontologies
complement and enhance the power of Linked Data, as they are formal
specifications of a shared conceptualization, and act as a cornerstone of defining a
knowledge domain. Tim Berners-Lee established four simple rules for creating
Linked Data:
"1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using the
standards (RDF*, SPARQL)
4. Include links to other URIs so that they can discover more things
(Berners-Lee, 2006)".
It is interesting to note that Linked Data seem to be a perfect fit for the nebula of
data objects mentioned above: statements can be linked to other statements,
archives in liquid times
12 The term "storage box" is used by Odom, Sellen, Harper and Thereska to illustrate how causal users may treat
networked environments as a place to make digital materials accessible across different physical places or
using it as an alternative place of storage for backup purposes. See William Odom et al., "Lost in Translation:
Understanding the Possession of Digital Things in the Cloud," in Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, Austin, 2012 (New York, NY: ACM, 2012), 781-790.
13 New representations of provenance as a more complete set of information about actions taken in the
origination and subsequent handling of a digital object can be represented in records complying with the
requirements of the PROV Ontology. See Paolo Missier, Khalid Belhajjame and James Cheney, "The W3C
PROV Family of Specifications for Modelling Provenance Metadata," in Proceedings of the 16th International
Conference on Extending Database Technology, Austin, 2012 (New York, NY: ACM, 2013), 773.
14 Please note that we are not referring to the case in which a document is assigned to different folders for
records management purposes. We are referring to the fact that a specific document gets a different context
according to the user that interacts with it.
236
giovanni michetti provenance in the archives: the challenge of the digital
15 Societal provenance is a term used to mean provenance in the broader sociocultural dimension. Records
creation, management, use and preservation are sociocultural phenomena. Therefore, provenance may be
interpreted taking into account the sociocultural dimension as the context in which all actions take place.
16 For more information on RDF, see https://www.w3.org/RDF/.
17 The triples describe resources, so they may be interpreted as metadata, that is, data about data. However, it is
important to highlight that being metadata is not an ontological property, since there is no such thing as
metadata per se. Some data are called metadata, because a special value is assigned to them - they are
recognized as conveying information on some specific dimension considered as being relevant in a given
context. For example, dates are usually considered metadata, because of the relevance of the temporal
dimension. At the same time, dates are data, because they are usually embedded into documents, that is,
they are integral part of the datum. There is no antithesis nor contradiction - everything is data. Sometimes
it is called metadata to highlight its special value.
18 RDF is a data architecture, while Linked Data is a way of publishing RDF data.
237