Menu

Cuneiform Character Mining for Semantic Tablet Matching

Fig. 1

Hundreds of thousands of cuneiform tablets have been excavated in modern times. The tablets are important historical artifacts documenting the language, history and culture of the ancient Near East. These cuneiform tablets constitute one of the greatest and most comprehensive part of antique texts. The translation of such a tablet is a difficult task only manageable by arduous research of already translated tablets of the same language.

Currently, we are developing a similarity metric for cuneiform characters represented as a collection of spline paths. The similarity metric has to be fast and robust, cuneiform script is a hand-written script where the same characters exhibit small differences in position and shape of their wedges.

We are evaluating three different methods to approach the problem of cuneiform character recognition. The first approach simplifies and transforms a character into a graph and uses graph matching algorithms. Our second approach decomposes a character into two stages of features, the positions of the wedges first, followed by the shapes of the wedges. [Fig 1] The similarity of two characters is the geometric distance between these features. The third approach approximates characters with a point cloud and finds an optimal match using ICP.

We find that our second approach, the two-staged decomposition, provides the best results. We achieve 95% precision at 10% recall. If 10% characters of a certain label are to be recalled, more than 95% of the result set has the correct label. ([Fig 2], results labeled “Assigned”)

Out next intermediate goal is the development of a database of cuneiform tablets and labeled characters. We assume that a database of cuneiform characters with methods to match and query those characters quickly, enables, among other things, the development of keyword search for tablets, the suggestion of character labels on untranslated tablets and the possibility to use methods from computational linguistics on a cuneiform corpus.