Content-based appraisal

A variety of sources provide evidence for content-based appraisal activities. Policy documents are shared within organisations as guidelines for certain aspects of content acquisition. At a lesser granularity, broad aspects of collection policy are publicly shared. For example, an archive may publicly focus on collections relevant to a particular broad theme or subject, such as ‘20th-century Scottish artists’ or ‘Charles Rennie Mackintosh’. An archive may work from a lengthy policy document (collection strategy) identifying individuals of specific interest within that broad mandate, which specifically guides certain appraisal decisions. Such policy documents are commercially sensitive and may remain confidential for that reason. Existing material within a collection represents a key data source for content-based appraisal, since it enables characterisation of the substance of the present collection. That is, if a candidate item is under consideration for addition to a collection, several of the appraisal criteria will reference characteristics of the existing collection. Does it replicate material already held? Alternatively, does it complement material held, or themes within the collection? Hence, knowledge about existing holdings is a prerequisite for some aspects of content-based appraisal; appropriate and rich characterisation of existing holdings is therefore a challenge of importance to content-based appraisal. Furthermore, certain appraisal criteria require not only that material within current holdings is characterised, but also that material held by other institutions is considered: one such criterion is Replaceability, the ease with which an item can be replaced.

Content-based appraisal may be supported wholly or in part by automated processes. An example of the latter is supporting the archivist in rapidly characterising collections according to specific criteria.

In a scenario where large data collections are provided on a storage medium such as a DVD-ROM or hard drive to an archivist in order to allow for appraisal, there is the problem of the file system often not being well-organised. Whilst the same is true of large paper collections, paper collections are essentially mono-dimensional, allowing the archivist to rapidly ‘flip through’ the material and gain a sense as to the shape of the collection. The hierarchical nature of a file system renders appraisal difficult; furthermore, context and dependency of digital objects may not be immediately clear [Ross (2012)]. Therefore, archivists may make use of tools intended for digital forensics in order to rapidly evaluate digital collections. See for example [Kirschenbaum, M. et al (2010)]; [Lee, C. et al (2012)]; [John, J. L.  (2012)].

The above prototype is implemented in the web framework D3. Right: Geographical metadata about material within a collection can reflect collection policy. This visualisation, implemented in R using geo-indexing based on resources taken from the Geonames project, displays the geographical origin of artists within a collection; in this case, it highlights prevalence of British and, to a lesser extent, US artists within the collection corpus (Github, 2015). Wholly-automated content-based appraisal is likely to be of use in various scenarios in which ‘preservation-focused maintenance actions’ are indicated. It is particularly appropriate in scenarios in which periodic reappraisal of material on the basis of new evidence is felt to be indicated. An example of such a scenario results from semantic change: as a result of such change, there is a need to periodically re-index material to better reflect the relevance of contemporary classification categories, enabling search and browse to function more effectively across a collection. Such an indexing process may be wholly automated.