Semi-Automatic Transcription Tool for Ancient Manuscripts

In this work, we investigate various techniques from the fields of shape analysis and image processing in order to construct a semi-automatic transcription tool for ancient manuscripts. First, we design a shape matching procedure using shape contexts, introduced in [1], and exploit this procedure to compute different distances between two arbitrary shapes/words. Then, we use Fischer discrimination to combine these distances in a single similarity measure and use it to naturally represent the words on a similarity graph. Finally, we investigate an unsupervised clustering analysis on this graph to create groups of semantically similar words and propose an uncertainty measure associated with the attribution of one word to a group. The clusters together with the uncertainty measure form the core of the semi-automatic transcription tool, that we test on a dataset of 42 words. The average classification accuracy achieved with this technique on this dataset is of 86%, which is quiet satisfying. This tool allows to reduce the actual number of words we need to type to transcript a document of 70%.