What is frog on hand?

frog on hand is name of our on-line handwriting recognition project. The name is an acronym for "freiburg recognition of on-line handwriting". Several researches at the University of Freiburg have contributed developing it (please refer to the section of honor).

This web page aims to give a concise description of the recognition. It gives some explanations, visualizations and links into literature.

A rough division of the recognition system frog on hand can be made into four main modules. Those are dealing with

Figure 1: On-line data can be seen as a sequence of x-y-coordinates. In this example green points correspond to coordinates that are sampled when the pen touches the surface. Red points are sampled while the pen is lifted.

"What is on-line handwriting recognition?" and "Where to get data?"

HWR is the task of transforming a language that is represented in its spatial form of graphical marks into its symbolic representation.

On-line HWR, in contrast to off-line HWR, refers to the situation where the recognition is performed concurrent to the writing process.

This distinction has some consequences, regarding further differentiations between on-line and off-line HWR. Among others, these pertain important aspects like data acquisition, data representation, recognition methods and typical applications.

So, what is specific about on-line handwriting data? Again, here is a rather complicated, but precise definition:

On-line handwriting data are a dynamic, digitized representation of a (digital) pen movement, generally describing sequential information about position, velocity, acceleration or even pen angles as a function of time.

Often, simply a sequence p=[(x1,y1),...,(xN,yN)] of x-y-coordinates (xi,yi) is used to describe on-line handwriting data (see figures 1 and 3).

Contrary to on-line HWR, off-line HWR is usually performed some time after creation. In this context data is usually represented as a pixel matrix, e.g. as a scanned image (see figure 2). A typical off-line application is the automatic recognition of postal address fields or bank checks.

Our specific research is dealing with on-line handwriting recognition.

The largest and most popular non-proprietary database of on-line handwriting is the UNIPEN database. Its current release comprises about 60,000 lower case characters, 30,000 upper case characters, 15,000 numerals, 85,000 words and 15,000 free text segments.

Figure 5 [click for a larger view]: Features 1 ("normalized x coordinate") and 2 ("normalized y coordinate") are plotted according to their value in the x-y-plane. The feature 3 ("tangent slope angle") is illustrated in green by the direction of the tangent.

Feature selection

frog on hand employs a local feature representation. Depending on the application, the following features are computed either from the re-sampled or from the original writing:

a normalized representation of the (horizontal) x-coordinates

a normalized representation of the (vertical) y-coordinates

the tangent slope angle of the coordinates

Thus, posterior to feature selection we have a sequence of three-dimensional feature vectors. Figure 5 shows an example feature sequence. A more detailed depiction of the feature extraction is included in the publications [Bahlmann and Burkhardt, 2004] and [Simon, 2002].

Classification

Pre-processing and feature selection have transformed the handwriting data (i.e. a sequence of x-y-coordinates) into a sequence of three-dimensional feature vectors (as illustrated in figure 5). For a recognition, the feature sequence is to be presented to a pattern classifier.

We have developed two complementary classifier approaches, each one having its individual strong points. In order to cope with temporal variations, both make use of the dynamic time warping (DTW) (also known as elastic matching) concept. However they pursue quite different classification philosophies.

The second approach is using a discriminative method, as it is based on the support vector machine(SVM). We have developed a specifically adjusted SVM-kernel: the Gaussian DTW kernel. The classification method is thus calledSVM-GDTW.

Figure 6 [click for a larger view]: DTW aims to identify (or align) corresponding sample points in two vector sequences. Found identifications are shown by the green dotted lines in the upper figure. Note the "non-linear alignment" with the 3rd, 5th and 6th red sample points. The correspondences found can also be illustrated through an entry in a matrix, shown below. In this respect, an entry at element (i,j) means that sample i of sequence 1 is identified with sample j of sequence 2. This matrix is called Viterbi matrix, the optimal alignment Viterbi path.

Figure 7 [click for a larger view]: The Viterbi alignment of an (unknown) test pattern "b" and a reference pattern "h". The rectangular area shows the Viterbi matrix. (White areas indicate regions, which were pruned by a beam search criterion.) The Viterbi path is illustrated by the sketched line: It traverses all aligned sample point pairs. Corresponding points in the "b", "h" and the Viterbi path are coded in the same color. The reader can see apparent temporal distortions in the middle (cyan) and at the end (red) region of the Viterbi alignment. The Viterbi distance "D" is 10.78 for this alignment.

Figure 8 [click for a larger view]: A reference model of an "h". The reader can see a connected sequence of dots, each one with an additional line attached. Remembering, that a reference model is a sequence of Gaussian PDFs, each dot illustrates the mean of the Gaussian and each line the first eigenvector and eigenvalue of the covariance matrix by its direction and length, respectively.

Movie 1 [click for a larger view]: Demonstration of a CSDTW classification, available in the following formats:

Classification with CSDTW

CSDTW (cluster generative statistical dynamic time warping) is our extension of the dynamic time warping (DTW) classification. CSDTW supplements DTW by a holistic combination of a cluster analysis and a probabilistic / statistical modeling.

What is DTW?

DTW is a concept that allows an elastic match of two sequences. Figure 6 gives a graphical illustration of an example elastic match: Given two vector sequences (in the following called "test" and "reference" sequence), DTW aims to identify (or align) corresponding sample points among test and reference sequence, as is shown with the two instances of the "3". In this respect, the matching is not a linear point-by-point matching, but it is elastic.

In DTW classification, the recognition of a character is performed with help of the so-called Viterbi algorithm. It can be summarized by the following two main aspects:

A resulting distance, the Viterbi distance is computed by summing up (or averaging) the Euclidean distances between the feature vector pairs, which are defined by the Viterbi alignment.

The classification itself is then performed pursuing a minimum distance philosophy by computing the Viterbi distance of the test sequence to every element in a set of reference sequence templates.

What is CSDTW, frog on hand's extension of DTW?

CSDTW uses a generalization of the DTW distance as the underlying distance measure. Additionally, it aims to treat writing variations by a holistic combining of cluster analysis and statistical sequence modeling.

To be more precise, the cluster analysis is employed with the aim of identifying different character "shapes" or "styles" (which are called allographs in HWR research) in the training set of characters (see figure 11 to see different styles of a character "b").

Based on the clusters identified, statistical sequence reference models are estimated. Each of those corresponds to a sequence of probability density functions (PDFs), one PDF for every re-sampled point. Simple multivariate Gaussians are used (see figure 8) as PDF basis functions. In this respect, CSDTW shares common ideas with hidden Markov modeling (HMM).

Contrary to previous attempts of combining clustering and statistical sequence modeling, CSDTW uses a single feature space and a closely related dissimilarity measure for the cluster analysis and the statistical modeling.

A special treatment for the statistical modeling is necessary, as one of the features, the tangent slope angle, is a directional variable. We cope with this problem by modeling the feature space with a specifically adapted PDF, the multivariate semi-wrapped Gaussian distribution (see [Bahlmann, 2004]).

Figure 9 [click for a larger view]: An SVM classifier with a linear kernel for a simple classification in a 2D feature space. The discriminating "hyper-surface" (which is a line in 2D) is located in the center of the black area, the support vectors are plotted as circles.

Figure 10 [click for a larger view]: An SVM that discriminates "b" and "h". The list-box at the lower left lists all support vectors. Here, for each support vector the label and the weight (ai) is given. To the right the features of this support vector are sketched.

Movie 2: An enhanced animation of figure 10. Following formats are available:

Classification with SVM-GDTW

Our current "hot" topic is the incorporation of discriminative power into HWR. We address this issue by the utilization of support vector machine (SVM) classifiers. In this respect, the aim is to develop an SVM kernel that is adjusted for the use with sequential data: the result so far is the Gaussian dynamic time warping (GDTW) kernel.

What is an SVM?

An SVM classifier discriminates two classes of feature vectors by generating hyper-surfaces in the feature space, which are "optimal" in a specific sense: The hyper-surface obtained by the SVM optimization is guaranteed to have the maximum distance to the "nearest" training examples, the support vectors. This principle is sketched in figure 9 for a two-dimensional, linearly separable example. Chih-Jen Lin provides an instructive, interactive Java demo for two-dimensional data on his WWW page.

An important property of SVMs is that they do not directly operate on the feature vectors Pi or Pj, but instead on kernel evaluations K(Pi,Pj) of them. E.g., the classification of a test pattern T is performed by evaluating a weighted sum of kernel evaluations of T with each support vector Si,

y = sumi( ai * K(T,Si) )

and assigning the "positive" or "negative" class for T depending on the sign of y.

What is special about frog on hand's SVM-GDTW?

However, usual SVM kernels are designed to deal with data that is embedded in a vector space of fixed dimension. Since on-line handwriting data is not of a fixed dimension, but of a variable-length sequential form, SVMs cannot be applied to HWR with the usual methods.

We have addressed this issue by developing an appropriate SVM kernel that is defined for sequential data: the Gaussian dynamic time warping (GDTW) kernel (thus the acronym SVM-GDTW). In figure 10 and movie 2 a few typical support vectors and kernel evaluations with our GDTW kernel are illustrated.

Figure 11 [click for a larger view]: A dissimilarity dendrogram of a clustering of five characters "b". Each leaf in the tree represents the below positioned character pattern; a node denotes a merge step of two particular clusters in the agglomerative hierarchical clustering algorithm. The level of the node on the ordinate corresponds to the cluster dissimilarity D, when merging. A concrete choice for D_max determines the granularity of the final clustering.

CSDTW training

Generating allograph prototypes by hierarchical clustering

For each character class an agglomerative hierarchical clustering algorithm identifies a set of clusters. Figure 11 shows an example of the so-called dendrogram of the hierarchical clustering.

As a result of the clustering, each cluster is given by

a "cluster representative" and

a set of "cluster elements".

The clustering algorithm uses a dissimilarity measure that is closely related to the one used for classification and parameter estimation. Thus, clusters found in the cluster space correspond to well-formed clusters in the classifier space.

Estimating the statistical model parameters by Viterbi training

For each cluster the parameters of the PDF sequences are trained by the iterative Viterbi training. The initialization problem can be solved in a data-driven fashion. The parameters of the PDFs, which are represented by multivariate Gaussians, are initialized as follows:

the mean is set to the sample point of the corresponding cluster center

the covariance is set to the identity matrix

This configuration is the initial "model". The iterative training is an alternation of the following two steps:

Sample point correspondences in the cluster are detected by Viterbi alignments of the current model with all clusters members.

The PDF parameters of the new model (mean and covariance) are re-estimated in respect to the point correspondences detected in step 1. This model becomes the "new " model for the next iteration.

SVM-GDTW training

The SVM training is performed by the standard sequential minimal optimization (SMO). We don't have to care about the special form of the sequential data, since the training does not directly operate on the feature vectors Pi or Pj, but solely on kernel evaluations K(Pi,Pj).

Using the approach described in the sections above, the implementation is an integrated character recognition plug-in that runs both inside Qtopia and OPIE. The character label recognized by the plug-in is handed over to the active application (e.g. date book, address book, to-do list, etc.). Movie 4 gives an impression. Contrary to the implementation in our development environment, a fixed point data representation is used on the PDA, since the StrongARM processor lacks a built-in floating point arithmetic.

The classification of a character approximately takes 0.3 seconds in average on the iPAQ.

However, due to the rich representation of the reference models, the models occupy a rather large amount of flash space. Together with the program ~2.5 MByte are required :(

Feel free to test the plug-in on your PDA. A binary (packaged as an ipk; both Linux Compaq iPAQ and Sharp Zaurus are supported) is available from our feed.