This section is extracted from the deliverable D4.1 Initial version of environment information extraction tools [Corubolo, F. et al (2014)], page 40ff
Please refer to the further reading list for the names referenced to in [ ]
Departing from the concept of an entity fitting its niche, we analysed the kinds of information extractable from a digital object (DO) and its environment to improve its chances of being useful in the long-term.
By adopting the practice of sheer curation, i.e. a situation in which curation activities are integrated into the workflow of the researchers creating or capturing data [see sub-module “Introduction to sheer curation”], we considered how to examine the environment to discover and extract those dependencies that are crucial for different users for the use and reuse of DOs. Sheer curation is a perfect parallel to the situation in biology where organisms cannot be observed reliably outside of their niches, which would result in an unavoidable loss of important information.
In our working hypothesis, the DO was the organism, with the binary data representing it considered as its DNA, and the natural environment where this organism lived corresponded to the system environment.
To map their connectedness, a software agent, as for example our PERICLES Extraction Tool (find out more in the next sub-module), observed and collected information about interactions between the DO and its immediate surroundings. By observing such interactions one can obtain a series of observations for further analysis and recognise functional dependencies. Examples for what can be necessary to make use of the object in the future, depending on the purpose, include system properties (hardware and software configuration), resource usage, implicit dependencies on documentation, use of external data, processes and services (including provenance information), etc.
Such information cannot be reliably reconstructed after the DO is archived. It has to be extracted from the “live” system when the user is present, and preserved together with the DO.
As a concrete example, every software has a distinct way to represent and process DOs, and just changing the software version, not to speak of the software itself, can have an impact on the DO’s properties.
By now it should be apparent that the SEI is a very close analogue to the definition of a niche in biological environments, also with regard to its definition by functional analysis in order to determine what information is necessary for a particular function of an organism or a DO.
The niche we identify will be constituted only by the information that matters and not by all the available information. For instance, we may collect provenance information with the aim to infer functional and documentation-related dependencies between DOs, but with the objective of not preserving complete provenance information per se (as not everything may be significant). A typical use case that requires the extraction of SEI is the creation of a new environment similar to the old one in such a way that it still fits the SP of the DO, just like the genome of a fossil could be revived under the right laboratory circumstances.
A real-life scenario: software-based art (SBA)
To substantiate the practical applicability of the introduced biological metaphor to describe the digital environment of a digital object, we apply it to the example of preserving a software-based artwork. SBAs are defined as artworks with a software component, which makes their behaviour and appearance highly dependent on their digital environment.
The example here refers to a SBA having to be installed within a new digital environment for the purpose of an exhibition. To this end it is important to ensure that the significant properties of the SBA influenced by the conditions of the digital environment, are matched by the new system environment. This situation is comparable to that of an organism moved from its niche in the natural biological environment to a newly created biological environment. The niche can be emulated by replicating the original surroundings as precisely as possible with natural flora, fauna and resource settings, or an artificially modelled one with substituted components, for example when the organism is migrated to a zoo. To ensure its well-being and support its natural behaviour, the SEI of the niche that influences these conditions has to be preserved.
Many components and processes that constitute the ecosystem, biological as well as digital, have a great impact on the significant properties of its entities. E.g. the behaviour of a SBA item depends on the available system resources, just like an organism depends on food availability – upgrading computer system memory can affect the execution speed of software programming methods as much as increasing food availability in the niche will affect the foraging rate of an organism. Consequently for the relocation of a SBA object, as well as for that of an organism, one has to determine what the SEI is, i.e. we have to extract it from the original ecosystem for an accurate emulation of the digital environment (see figure above). Even if the SBA does not depend on other running software, it is affected by the interaction of that software with its digital environment, similarly to how organisms affect other living entities by manipulating certain ecosystem features. In the context of emulation the used peripheral, its configuration and the driver have to be preserved as well to conserve the user interaction experience, again similar to the situation where the interaction of the organism with other biological environment entities can be affected by a feature change in the biological environment. The graphic drivers, libraries and display configurations, like resolution and contrast, all belong to SEI because they influence the display of the SBA and therewith the perception of the viewer, as an observer’s perception by taking a picture of the organism can be affected by the settings of his camera. Here the observer is considered as part of the ecosystem.
In a biological environment, behaviour patterns of organisms can be observed e.g. during the foraging, mating, and social interactions. Likewise, for the move to the new environment it is important to validate the behaviour of the moved object to test the reconstructed niche components. This can be done by the comparison of behaviour patterns with those measured in the original niche.
Patterns for a SBA item can be extracted from measurements of the system resource usage; external API calls, peripheral calls and process timings plus the analysis of log messages. Furthermore, video recordings and documentation of the SBA installation can be used as well. Changes of other ecosystem entities that interact with the SBA, for example a runtime environment change, another version of the programming language installation or storage backend, can result in execution bugs that are comparable with abnormal behaviour of a living being caused by changes in its social interactions with other beings. Information about SBA development is useful for migration, since it can become necessary to recompile new binaries from the source code, dependent on the used programming language and the new system environment. This situation is comparable with an organism that is bred or cloned for better adaptation to changed niche conditions, and so that the outcome preserves the quality and resilience of the migrated object.