The systematic large-scale production of digital scientific objects, such as 3D models, requires much more infrastructure than a classical digital archive connected to a workflow manager. The size of the data to be handled, the distribution of expertise, acquisition and production sites, and the complexity of the processes involved require an innovative integrated environment that combines content management and information retrieval (IR) services with a centralized knowledge management in order to monitor, manage and document processes and products in a flexible manner.

The 3D-COFORM project aims to advance the state-of-the-art in 3D-digitization and to make 3D-documentation of cultural and other material objects an everyday practice by providing an integrated environment of services. The implementation and research period started in December 2008 with a duration of 48 months (European Community’s FP7 IP n° 231809; 2008-2012).

As part of this project, the 3D-COFORM Repository Infrastructure (RI) was designed and is being implemented as a sustainable, distributed repository for a massive quantity of large digital objects and their metadata capturing scientific descriptions and monitoring all processes of data generation and processing. The RI is designed to allow different kinds of user (researchers, academics, web users, etc) to store, manage, query and retrieve whatever temporary and permanent digital objects are created in the course of digitizing physical objects, post-processing, and down to the final 3D products. Furthermore, it is designed to support their use and reuse in presentations and the scientific discourse about the modelled objects and their features, for art conservation, virtual reconstruction or hypothesis building about historical provenance and use. The distinctive innovation of the RI is that it can store the complete processing history of the digital artifact, its digital provenance. It is this feature that allows assessment of the authenticity of the resulting high-quality model, and reasoning about its properties, fidelity and fitness for purpose.

Data acquisition for 3D models may take place in a studio, but more commonly one has to go to the physical objects themselves, which may involve mobile equipment, open-air conditions, lack of network connections and reduced local processing capabilities. Particular processing services may be offered by specialized companies only in their premises, which results in data distribution. The acquisition is often costly, or even unique, requiring data replication. Data may be sensitive, or in the terabytes scale, making access rights, location and transfer an issue. Selection of data, manual processes, refinements, re-processing with improved methods, taking a series of measurements etc. contribute to the need of a centralized, integrated knowledge management in order to cope with defining the ultimate products, reasoning about their properties, and garbage collection.

Figure 1: Overview of the Repository Infrastructure architecture: The clients communicate with the repository via SOAP through a central webservice. The RI dispatches the requests to its components (the MR, OR, and CRI Services). Data transfer is performed directly between clients and OR nodes, as initiated by the OR Service which controls the distributed OR nodes.

The RI offers the aforementioned features through a central entry point, the RI-API webservice, following a highly compatible SOA approach (SOAP). Behind it there is a webservice-extensible Cloud Computing System (CCS) that can interface to other CCSs and Linked Open Data (LoD) based services. The essential internal components of the RI are: (a) the Object Repository (OR), a distributed mass data storage layer, (b) the Metadata Repository (MR), an integrated semantic network layer, (c) the Content Retrieval Indices (CRI) for different search modalities by several content modules, and (d) the Query Manager (QM), which provides a single homogeneous access point to query the three components. The RI offers a central thumbnail database and provides a webservice for handling http thumbnail management requests.

The Object Repository (OR) connects to a (potentially large) number of distributed OR nodes for the physical data storage. The central OR Service provides access security, data integrity and risk-of-loss control, including metadata backup. It consists of a relational database (ORDB) for data file management, the query manager (QM) for mapping the relational database to the RDF format, the DT-Controller module for controlling the data transfer between client computers and OR nodes, and between OR nodes (replica management). The distinctive feature of the OR component is that all data transfers are logged for legal reasons: Not only the acquisition and post-processing of digital 3D assets are expensive, but high-quality 3D models can even be used for creating a high-quality physical replica, eg, through 3D-printing. So it is becoming ever more widely understood that 3D datasets are valuable assets that must be treated carefully, and that their proliferation needs to be faithfully recorded and controlled.

The Metadata Repository (MR) is an RDF triple store that aims at providing a common place to reason on, query, manipulate and export provenance metadata concerning any temporary or permanent object stored in the OR and related metadata about the modelled reality. Metadata are recorded in files in the units of creation, physically backed-up in the OR together with their related content files, while in the MR a semantic network is built with the integrated metadata information. The MR is based on a homogeneous global schema - an extension of the CIDOC CRM (ISO21127) that models provenance metadata (CRMdig). It comprises physical object descriptions, annotations and co-reference information, format and compatibility information of 3D models, historical events, and real world objects. All this information is stored in a coherent semantic network that enables useful and complex inferences to support content management and research, comprising even diverse content indexing and retrieval mechanisms for 3D objects. An integral part of the MR is the Annotation and Co-reference Manager (ACoRM). The Annotation Manager connects links into content segments of any kind and dimension (areas with no modifications on the original object), with semantic information capturing the related scientific discourse. The Co-reference Manager allows for collapsing duplicate URIs in the semantic network without loosing their provenance. This feature acts as a "mending" mechanism of the semantic network and will contribute significantly to the reasoning performance and the future connection into a Linked Open Data (LoD) world.

The responsibility for the RI design and implementation is shared between FORTH-ICS (Greece) and CGV, TU-Graz (Austria), while other partners in the Project deal with the creation of a rich Integrated Visual Browser interface to the RI and an immense spectrum of 3D tools, all integrated via the RI.