State of the art – Training Module

In the last 15 years the concept of appraisal has been covered by different research projects in the field of digital preservation.

The Interpares Project 2 (2002-2007) provides a useful definition and structure for performing appraisal, and further provides checklist of factors that should be considered e.g. value, context, and authenticity. It defines appraisal as follows: “to make appraisal decisions by compiling information about kept records and their context, assessing their value, and determining the feasibility of their preservation; and to monitor appraised records and appraisal decisions to identify any necessary changes to appraisal decisions over time.”

The Paradigm project (2005-2007) lists important characteristics to consider in appraisal. These include the content of an archive, the context of the archive and whether the records have evidential value, the structure of an archive and the extent to which it sheds light on the business, professional or organisational prerogatives of the creator, and technical appraisal, i.e. can the institution maintain the digital records in a usable form. It also lists important cost factors.

The DCC Digital Curation Manual Instalment on Appraisal and Selection [Harvey, R. et al (2007)] provides general guidelines on appraisal, based heavily on library practice. It defines “technical capacity to preserve” as a key appraisal factor. At the end of the manual are guidelines on how to develop an institution specific selection framework.

Automation of appraisal

The University of Illinois (Metrics Based Reappraisal Project, 2014) proposes an iterative, technology-assisted, metric-based approach to appraisal. It takes into account usage statistics and other business performance measures for assigning a value score at the appraised resources, particularly aimed at automating appraisal of emails.

The Arcomem project [Risse, T., et al (2012)] used the linked structure of web pages and the social web as a way of appraising and selecting content to be crawled. A further aspect of appraisal is to mine information about the trustworthiness and reputation of users from social web mining.

The PLANETS project developed tools and services for digital preservation [Farquhar, A., Hockx-Yu, H. ( 2008)], with a focus on experimental evaluation of preservation approaches within a controlled environment [Aitken et al, (2008)]. In particular, the PLANETS test-bed enabled evaluation of automated appraisal processes on large datasets, including an ‘automated characterisation and validation framework for digital objects’.

Similarly, the SCAPE project, or Scalable Project Workflows, aimed to provide a framework for automated, quality-assured preservation workflows (Edelstein et al, 2011).

Risk management

Appraisal is often focused on characterisation of an object in its current state. However, a further dimension of appraisal is the effect of passing time: that is, the potential that events might occur in the future that limit the potential for preservation of material into the future. The DCC Digital Curation Manual identifies risk management as increasingly central to discussion of appraisal and selection [Harvey (2006)], permitting risks such as reduced accessibility, interpretability or ability to render material to be balanced against the consequences of that outcome. Traditional risk analysis is based on risk-impact (mitigation) analysis. This is a process, usually iterative, in which the following sequence of steps is typically taken: identification of risks; assessment of the severity and potential consequences of those risks (such as financial consequences, impact on schedule or technical performance, and so forth); planning for mitigation; implementation of mitigating actions based on the plan developed. As risks evolve, they are tracked.

The general-purpose project management methodology PRINCE2 specifies a series of steps in building and applying a risk management strategy [Bentley (2010)]. A review of older methodologies for risk management may be found in [Raz and Michael (2001)]. Risk management was brought into the forefront of preservation by the Cornell Library study into file format migration, reported by [Lawrence et al, (2000)].

Many of the essential characteristics of a risk management toolkit were determined by PRISM [Kenney et al, (2002)]. Several existing risk management frameworks are explicitly intended to support preservation activities. These include DRAMBORA, the Digital Repository Audit Method Based on Risk Assessment [McHugh et al, (2008)]; TRAC [Trustworthy Repositories Audit & Certification: Criteria and Checklist (2007)], which includes risk-oriented terms in a checklist of key terms; TIMBUS, or Timeless Business Processes and Services [Vieira et al (2014)]; and the SPOT (Simple Property-Oriented Threat) model [Vermaaten et al (2012)], which focuses on risks to essential properties of digital objects.

Various tools are designed to support risk management in digital preservation planning, such as PLATO [Becker et al (2008)]. A criticism that might be made of many of these tools is that the majority of such approaches do not focus explicitly on quantitative models, and rely on elicitation of craft knowledge in forecasting. A considerable amount of recent research into risk analysis is available, much of which applies quantitative models in the forecasting of risk. Concretely, [Stamatelatos (2000)] recommends the use of probabilistic risk analysis for the deconstruction and evaluation of risk associated with elements of complex entities. For the analysis of events that have occurred to ascertain the cause, fault tree analysis may be used; for the analysis of events yet to occur, event tree analysis may be used. [Zheng (2011)] provides a detailed analysis of risk modelling in order to support decision-making in management of product obsolescence, which may straightforwardly be adapted to the purposes of forecasting and managing software obsolescence. Risk analysis may use publicly available resources for informational purposes; for example, [Graf and Gordea (2013)] demonstrate the use of DBPedia data to evaluate file format obsolescence.

[Falcao (2010)] provides a qualitative approach to risk analysis of software number of detailed worked examples, which provides a valuable reference for such ecosystem.