Using PeriCoDe – Training Module

PeriCoDe/SALIC is designed for machine learning researchers and developers who will want to build their own software which takes advantage of functionality from PeriCoDe. Unlike the other tool described in this module (PROPheT), PeriCoDe is a toolbox of functions called from other software written by the user and does not have a GUI (graphical user interface).

However, to get started and provide the user with a basic example implementation of PeriCoDe, a ‘wrapper’ (wrapper.m) is included. This can be modified and used as a basis for the user’s own software, but it also provides a basis for setting up a minimal, working version of PeriCoDe.

To get this minimal version working, the user needs to provide three data sets for SALIC:

Training data set – this is a collection of annotated images specific to the user’s annotation requirements, for example, ImageCLEF 2012 (http://imageclef.org/2012/photo-flickr). Since this data set is annotated, there will be two folders or directories associated with it: the first containing the images, and the second containing the labels (or tags or annotations).
Pool data set – this is the much larger data set derived from an external source (e.g., social media). For example, wrapper.m specifies the MIRFLICKR data set (http://press.liacs.nl/mirflickr/). This data set may contain social media tags or annotations (stored in a separate directory to the images) but these may not be consistent with those supplied for the training set.
Test data set – this is a subset of the collection of annotated images used for the training set, but which were ‘held out’ and unseen during the training process (in order to test the performance of the image annotation/classification).

In addition, for each data set, there needs to be two files created which list the tag filenames and image filenames (in wrapper.m these are named ‘img_Files.mat’, ‘tag_files.mat’).

With this data now in place, the wrapper.m file needs to be modified so that the information it contains relates to the location of these files and directories. Therefore ensure the relevant paths are correctly set (these can be found on lines 21-25 of wrapper.m), for example:

prms.train_img_folder = './data/datasets/imageclef/images/';

Should relate to the training set of images located in ‘./data/datasets/imageclef/images/’ (if not, then this should be modified to match the desired directory). Similarly the parameters contained in lines 14-17 should relate to the names and locations of the data sets used by SALIC, as should the parameters for visual features (lines 29-30) and for textual/annotation features (number of dimensions for PCA and the location for Vocabulary and PCA files; lines 33-34).

With the wrapper.m file now configured for the local environment in which SALIC is running, it is now possible to run the program. After initial training of the model on the training set, and then iteratively adding new examples to the training set from the pool data, wrapper.m will retrain the model and evaluate its performance on the test set (which is displayed as a plot).

Having got this basic implementation of SALIC working, the user can then modify the wrapper.m script in order to provide different outputs or custom functions, as well as to test and explore the active learning approach on the user’s own annotated image data set.

When working with SALIC, users are asked to cite it as follows:

E. Chatzilari, S. Nikolopoulos, Y. Kompatsiaris and J. Kittler, “SALIC: Social Active Learning for Image Classification,” in IEEE Transactions on Multimedia, vol. 18, no. 8, pp. 1488-1503, Aug. 2016.
doi: 10.1109/TMM.2016.2565440
URL: http://dx.doi.org/10.1109/TMM.2016.2565440

Users may also contribute to SALIC on Github:

https://github.com/MKLab-ITI/salic