This section is extracted from the deliverable D4.1 Initial version of environment information extraction tools [Corubolo, F. et al (2014)], p.21-23
Please refer to the further reading list for the names referenced to in [ ].
The widest set of information in our view is the environment. We consider environment information to include all the entities (DOs, metadata, policies, rights, services, etc.) useful to correctly access, render and use the DO. The definition supports the use of unrelated DOs and conforms to the definition for environment used by [PREMIS (2008)] .
In the context of OAIS, the term ‘Representation information’, and its specification ‘Other representation information’, together with the information in the PDI, seem to include some of the information we would classify as environment information. The point of view in OAIS still is that of supporting the understanding of the object, and does not qualify the different uses and purposes for the information. In a way, we also propose a simpler definition, which does not aim to a classification of the different types of information (structural, semantic, other, and preservation description information (PDI) plus sub categories) while we focus on the user and use of the information from the creation context and eventually from different communities.
PREMIS builds on the OAIS reference model and defines a core set of metadata semantic units that are necessary for preserving DOs. The set is a restricted subset of all the potential metadata and only consists of metadata common to all types of DOs.
The current, published set (PREMIS 2) defines a data model consisting of four entities (see the data model in the figure above): the Object entity allows information about the DO environment to be recorded amongst other information. The Rights entity covers the information on rights and permissions for the DO. The Events entity covers actions that alter the object whilst in the repository. The Agents covers the people, organisations or software services relevant that may have roles in the series of events that alter the DO or in the rights statements. The Intellectual entity allows a collection of digital objects to be treated as a single unit.
Dependency relationships are defined in PREMIS 2 as “when one object requires another to support its function, delivery, or coherence of content. An object may require a font, style sheet, DTD, schema, or other file that is not formally part of the object itself but is necessary to render it.”
The PREMIS working group undertook an investigation of the environment information metadata based on feedback from their user-groups that found the existing support to be difficult to use. The group reported in [Dappert, A. (2013)] their findings which entailed promoting the environment information to a first-class entity and not a subordinate element of the DO for the next (new) version of PREMIS (PREMIS 3 – Note: version was meanwhile released, after the publication of the deliverable D4.1 by PERICLES). They advocate the use of the Object entity to describe the environment, which allows relationships between different environment entities, as illustrated in the figure below.
This approach neatly supports the PERICLES view of the environment although PERICLES makes a distinction between the general environment and the environment significant for a particular set of purposes (termed the Significant Environment Information for a DO), which is described in the following section of this sub-module.
We consider the environment information for a DO to be the widest set of entities that is related to it. This would include by definition all other DOs, information, services and other information that can relate to the DO, but also other information from the environment that is useful for any of its possible uses.
We consider this a wider set, although related, to the one described in the OAIS as Representation Information, and we take a different focus than that defined by PREMIS 3, as we are not focused on Software Environments. Another important distinction is that in general, we look at the environment as something defined from a DO upwards, and thus we see environments as defined on the DO. This is a different point of view from the one that is taken in WP5, which takes an ecosystem perspective, thus looking from the system and institution point of view downwards to the different entities, services, processes, users and digital objects.
Furthermore, we consider that a part of the environment information will only be observable in the live environment of creation and use, reason that drives our choice of a sheer curation approach (as described later in the next chapter). While looking at the DO environment, we think that the user is an important part of it, and for that we observe the interaction between users and their communities, the DOs, and the rest of the environment. We think that this perspective will allow us to capture the information based on the pragmatic, sometimes neglected aspects of the real requirements for making use of DOs. This will also help us in the task of inferring dependencies that are not explicit and determine relevant information based on real use of the DOs.
As environment information will be a very wide set of information, it will be important to qualify what information is significant for and what not, as we are introducing in the next section of this sub-module.