We have already indicated how much ISO's standardization of the Open Archival Information System (OAIS) in 2002 has weighed heavily on developments in the Titan NPO projects.
The OAIS Reference Model was developed by the Consultative Committee for Space Data Systems (CCSDS) as a contribution to the ISO 20 Technical Committee, Subcommittee 13. It is a framework reflection for the understanding and application of concepts necessary for the long-term preservation of digital information (including technological evolution). This ISO-standardized model is a universal reference outlining the functions, responsibilities and organization of a system to preserve information. (in particular digital data), in the long term, to ensure access to identified user communities.
Figure 1: The OAIS Functional Model
- Ingest: process of conformation of objects intended for a digital archive. The Submission Information Package (SIP) is the digital object to be archived with ancillary metadata. The Archival Information Package (AIP) is generated from the SIP as the final step in the ingestion process. The AIP contains all descriptive metadata, technical, project information, access and usage rights, processing (antivirus scans, extraction mode, ....).The ingest transfers the data of the producer (SIP) to the Archivist / Archives (AIP).
- Archive storage: Following the ingest, the AIP is stored, maintained and retrievable from the archive center. Archive storage includes persistent storage, regular checking of bit stream integrity, and disaster recovery.
- Data Management: This feature supports searching and retrieving archived content using descriptive metadata.
- Administration: Refers to day-to-day operations and maintenance of archives and coordination with other functions: archiving, user assistance, implementation and maintenance of policies and processes, etc.
- Access: The interface that allows users to retrieve data from the archive. The information requested by the user is received as a set of broadcast information (DIP), generated from the AIP stored in the archive center.
- Preservation planning: archives must have a continuous digital preservation strategy (regularly updated) and be monitored regularly to detect the risks inherent in this type of activity.
- Common services: IT services that any computer system, such as a digital archive, needs to function: hardware, software, data, processes, agents, feedback for improvements, etc.
For the "modelling" part, the ISO standard recognizes three types of formats for the representation of the contents (SIP - AIP - DIP) and specifies what must be represented: the Content Information, the Preservation Description Information (the origin, the context, the identification and integrity of the published content), Packaging and Content Description. On the other hand, there is no proposal formulation for formats.
Figure 02 : OAIS model formats: SIP - AIP and DIP
- SIP: "Submission Information Packages": submission formats for archiving. These are the most complete formats that applications can generate and where objects are defined independently. These SIPs are provided by a 'producer' for import into a system.
- AIP: "Archival Information Package": Archiving Management Formats: SIPs are processed in ingestion, validation and structuring modules to enable the ability to manage persistence within an organization. system. That is, the AIPs have a vocation of managing the evolutions of the archived contents and must be sufficiently general to be able to generate targeted formats on demand for the export.
- DIP "Dissemination Information Packages": targeted export formats: These representations are called "exogenous". These are targeted formats for a particular 'designated community', with a defined overall purpose. EBUCore is the undisputed relevant example. It focuses on the needs of broadcasters to exchange exploitable content by including their environment.
- P-DIP "Persistent Dissemination Information Package": a notable special case, where the 'designated community' is another archive system.
The following figure describes how the packaging of data / information should be done:
Figure 3: Packaging: Concepts and Relationships
The data "Content Information" and "Preservation Description Information" are integrated into a container with a description (Packaging Information & ID). This container is described by a Descriptive Information Package entity».
The “Descriptive Information Package” data is transmitted to the Data Management Entity for search, control and retrieval functions of the data contained in the archiving system. It constitutes the repository of a database.
OAIS is an Information Model that processes both digital and non-digital objects simultaneously. The model must indeed be able to process existing physical objects, representation (physical objects) of the real world, but also the digital representations that describe them (digital objects). This practice makes it possible to make separate statements about an object, a document that describes it and the links that exist between objects and representations and their meanings (the signified).
In the OAIS model, the main thing is the creation of an Information Object. The diagram above (a vision specific to the Titan non-profit organization) clearly identifies the object data (bits and bytes) materialized by a representation tool (specific application) and interpreted at the level of meaning by a Knowledge Base.
In the context of a “deep preservation” it is necessary both to preserve the data (on suitable media), the applications that generated these data and finally to create a knowledge base to generate the links between the data and their meaning (s). The preservation of data and applications is by no means the object of this project ... all the effort is focused on the ability to connect the world with their objects, their computer representations and their meanings. The creation of a knowledge base is nodal!
Figure 04 : OAIS : The Titan Vision
A new version of the OAIS was published by the ISO in August 2012. This revision brings several modifications:
Taking risk management into account
- Management of access rights and usage information for archived documents
- the definition of a reversibility plan (return of archived data) and the ability of the system to ensure the destruction of data under certain conditions;
- finally, the concept of "information property" (or semantic information) that provides the meaning (signified) to associate with the data (signifier)
Figure 05 : The OAIS Data Flow Diagram
The flow of data between functional entities of the OAIS is illustrated by this figure. It describes the most important data streams. Administrative data flows, which are typically background activities, are not represented. The data flows associated with common services are implicit in the illustrated functions and are therefore not displayed.
Figure 06 : The OAIS Information Object
Information Object is basic concept of the OAIS Reference Model of information being a combination of Data and Representation Information. The Information Object is composed of a Data Object that is either physical or digital, and the Representation Information that allows for the full interpretation of the data into meaningful information (semantic). This model is valid for all the types of information in an OAIS.
The Digital Object is composed of one or more bit sequences. The purpose of the Representation Information Object is to convert the bit sequences into more meaningful information. It does this by describing the format, or data structure concepts, which are to be applied to the bit sequences and that in turn result in more meaningful values such as characters, numbers, pixels, arrays, tables, etc. These common computer data types, aggregations of these data types, and mapping rules which map from the underlying data types to the higher level concepts needed to understand the Digital Object are referred to as the Structure Information of the Representation Information object. These structures are commonly identified by name or by relative position within the associated bit sequences. This type of additional required information is referred to as the Semantic Information.. It will include special meanings associated with all the elements of the Structural Information, operations that may be performed on each data type, and their interrelationships.
Figure 07 : The OAIS Information Representation
Figure 07 emphasizes the fact that Representation Information contains both Structure Information and Semantic Information, although in some implementations the distinction is subjective. It is useful to remember that the Semantic Information associated with parts of some digitally encoded information is independent of the format. For example, the meaning of numbers in a data file is independent of whether they are encoded as scaled integers or as IEEE Reals; the meaning of words in a document is independent of whether the document is Word or PDF.
This figure also shows that Representation Information may contain Other Representation Information. This indicates that the taxonomy of Representation Information presented here is far from complete. For example software, algorithms, encryption, written instructions and many other things may be needed to understand the Content Data Object, all of which therefore would be, by definition, Representation Information, yet would not obviously be either Structure or Semantics. Information defining how the Structure and the Semantic Information relate to each other, or software needed to process a database file would be regarded as Other Representation Information.
Structure Information, Semantic Information and Other Representation Information are both sub-types and components of Representation Information. Representation Information is an Information Object that may have its own Data Object and its own Representation Information associated with understanding each Data Object, as shown in a compact form by the ‘interpreted using’ association. The resulting set of objects can be referred to as a Representation Network.
As an example, ISO 9660 describes text as conforming to the ASCII standard, but it does not actually describe how ASCII is to be implemented. It simply references the ASCII standard which is additional Representation Information that is needed for a full understanding. Therefore the ASCII standard is a part of the Representation Net associated with ISO 9660 and needs to be obtained by the OAIS in some form, or the OAIS needs to track the availability of this standard so that it may take appropriate steps in the future to ensure its ISO 9660 Representation Information is fully understandable.
Figure 08: The OAIS Archival Information Package
For the AXIS-CSRM project, it was essential to have a standardized reference schema showing how representations and processes are nested in order to build a functional model that takes into account the creation, production and publication of content. In fact, this standard suffers from the absence of the clear distinction between the concepts of data (representation) and information (meaning). Moreover, it is necessary to be able to inscribe this vision of a unique archive center in the universe of networksand Open Data.Both for the users (data to which everyone should be able to access and that everyone should be able to use and share) and exchanges between heterogeneous systems. It was therefore necessary to design an open architecture that imports / exports contents between multiple information systems (or archiving systems) and that is able to handle the transmission of the signifier (the flat representation) and the signified.
- ISO 16363 : Audit and certification of trustworthy digital repositories – sets out comprehensive metrics for what an archive must do, based on OAIS)
- ISO 16919 : Draft (2014) : Requirements for bodies providing audit and certification of candidate trustworthy digital repositories – specifies the competencies and requirements on auditing bodies)