Title
Identifying differences between metadata in files and copying metadata between files

Detailed description

Original TIFF files are stored as master files and JPEG files are produced from them as access copies.

Curators manually modify the access JPEG files to store additional metadata such as description of what is in the image etc.

It is required that there is an automated solution to transfer the additional metadata from the JPEG to the TIFF.

I initially investigated using Apache Tika 1.3 to extract header information from the files, to create lists of the conflicts and differences between the files' metadata. Tika was unable to identify all the fields and additionally was unable to transfer fields between files.

I then tried a current snapshot of Apache Commons-Imaging (1.0-SNAPSHOT), which was able to attach meaningful names to all the metadata fields. Using this library I was able to add additional metadata fields to a TIFF. However, the size of the main test file increased from 8MB to 10.5MB when the metadata was added, so it will need to be investigated. 8MB was the uncompressed size, corresponding to the expected size from the image dimensions. This method of rewriting the TIFF may be error prone and actual image data should be checked.

Exiftool can add metadata to files, without altering the actual image data, and is available as a binary for Windows. Experiments with "exiftool -tagsFromFile" showed that tags can be transferred with this method but the exact layout of the tags need to be investigated. Perhaps extracting the XML for the JPEG, leaving the fields of interest in the XML and then using -tagsFromFile to insert those tags in to the TIFF might work.

Exiftool XML extraction (e.g. exiftool -X file.tif) gives a long and detailed output for the file, more detailed than from Tika/Commons-Imaging/Exiftool default output.