This is a blog by Anna-Grit Eggers from the University of Goettingen. Together with Fabio Corubolo from the University of Liverpool, she is the creator of the PERICLES Extraction Tool (PET).
PET is a framework for the extraction of information from the creation and use environments of digital data. PET works outside the established metadata perspectives and their firm structures by monitoring and collecting information from the whole system environment during the use of data. It extracts scenario related information, which increases the data value by enabling new reuse opportunities. Lets explore the background:
Break the chains of metadata standards
Imagine you are a wildlife researcher and someone sends you an image of a really rare Tasmanian dog that walks on his hind paws through the forest. Checking the Exif geo location metadata makes you certain that the picture really has been taken at the Tasmanian forest. What a fascinating behavior of this wild dog! The thing that you don’t know is that the dog is tame and trained to give a show for a movie. This is Significant Environment Information that you should know for your research!
Environment information enriches the value of your data
You don’t have to be a wildlife researcher to gain a great benefit from environment information. Regarding your computer system environment, the information supports current data usage and enables future reuse not only for you, but also for generations of users and researchers that follow you and who probably investigate completely different scenarios.
Environment information is highly situation dependent and can’t be easily fitted into a scheme. Some of it is data dependent, as location or heritage information, other is similar for all data you create on a system, as the used hardware. Volatile environment information needs to be collected at the right time, as in the case of current system usage, or it can remain constant for a long time, as is the case for a list of used system drivers, but all environment information has something in common: You don’t want to collect it manually!
Significant Environment Information – what to collect
Which information should be extracted from the environment to improve the data use and reuse? We are convinced that the answer is highly scenario dependent. In PERICLES we discuss complex graph models to support this decision. Our approach is to map the environment entities, the data as well as the extractable information, onto a graph to map use dependent relations. Afterwards we weight these relations to express their level of significance for the intended use. Repeating this step for all possible data uses allows you to calculate which is the Significant Environment Information to be extracted. Or, for a start, just collect what you feel being or becoming important for your scenario and data uses. Your user experience is an accurate indicator that shouldn’t be underestimated.
What can the PERICLES Extraction Tool do for you?
How to handle this information overflow? What you need is a tool that manages the work for you – our PET! It provides a set of configurable Extraction Modules, which define what to collect and how to collect it. Once you have selected and configured the Modules that fit for your scenario – or chosen a ready made modules profile – the tool will start monitoring the system environment and collect all the significant information based on occurring environment events. Furthermore you can include your own customised Extraction Modules, for example to add your favorite metadata extraction tool to the framework, so all information will be collected and saved together.
A deeply scientific example
Our use case partners from the Belgian User Support and Operations Centre (http://www.busoc.be/) supervise the SOLAR module, which is located on the International Space Station. One of the controlling operator’s tasks is to handle module errors, which can be identified by an error code. There is a lot of implicit information based on the operator experience as to what path to follow to find a solution, that drives the search of the operator through the large collection of documents. The issue is documented in a Handover Sheet, which provides us with the exact date of the issue occurrence and solution, if we monitor the file changes with the PET:
The complete knowledge on solving the problem is concentrated on the operator. PET can be used to monitor which files are opened and closed to find the problem solving description. The recorded events are added to a timeline by the tool:
If we look which files are open at the moment when the problem solution is documented, we can conclude which files are probably necessary to solve the error, so the operators knowledge can be captured to help us define dependencies between the anomaly solution and the related documentation.
Try it out now!
PET is now avalaibe on https://github.com/pericles-project/pet, with documentation and tutorials. It is Open Source, so feel free to play around and, as it is still an early version, we appreciate if you leave us a comment about PET or in general the research field of environment information. The theoretical background can be explored in detail through iPRES 2014 (slides) where we published a paper, ‘A pragmatic approach to signifcant environment information collection to support object reuse’ [pdf]. We really wonder what comes up on this interesting topic next! Best wishes, your PET developers.
* Image of the dog: https://creativecommons.org/licenses/by/2.0/ from https://www.flickr.com/photos/63567936@N00/4208364986
* Image of the travel guide: https://creativecommons.org/licenses/by-sa/2.0/ from https://www.flickr.com/photos/15427016@N02/1670083930