Introduction

It was about time to update the content of this article.The original article was written in 2004, and the code provided was updated one year later, even if it was not available to download from this page. That said, this software is able to recognize mouse gestures, and was originally inspired by this article.

Even though I took inspiration from the article posted by Konstantin Boukreev, this application is quite different (different programming language, gesture management and neural network at least), even though it uses a similar graphic interface (I found it amusing!) and the same logic behind the computation of gesture features.

This update contains a lot of changes (to be honest, the code has been completely rewritten), and provides an easier integration and a more flexible and extensible implementation of the neural network, moreover the solution file is for Visual Studio 2008.More information will be provided below.

NOTE: If you are not familiar with artificial neural networks (ANN), please have a look at this page, since explaining such subject in detail goes beyond the scope of this article.

Details

GestureRecognizer: An executable, providing a possible implementation and integration of previous libraries

Test: A console application used for testing purposes

Regarding the first project, it contains a feed forward neural network implementation (composed of layers of neurons), some typical activation functions to be assigned to the neurons, an implementation of the Backpropagation algorithm and the PerformanceMonitor class, used to compute statistics about the recognizing performances of the system.The network implementation assumes that input features are already normalized (in other words there is no input layer dedicated to such a task).

The second library contains the implementation of the Gesture class, which is mainly a sorted collection of bidimensional points, and the GestureSet class (a collection of gestures, used to store training, test and validation sets used by the neural network), both providing load/save capabilities (files are saved in XML format).

Other than the above classes, there are some helper functions used to manipulate the geometry of the gestures.

The main difference between this version and the previous one is the ability to extract a variable number of features given the same number of gesture points; such a task is achieved by adding or removing points from the gesture, trying to minimize the deformation of the gesture (refer to the ExtractFeatures(int count) function in the Gesture.cs file).

The executable project is probably the most interesting, since it gives an immediate visual feedback to the user.

Just like the previous version, the classificator is composed of a set of competitive neural networks, each one used to recognize just one gesture; the main difference lies in the ability to customize the morphology of the net, choosing the number of inputs, layers, neurons per layer and type of activation functions (refer to menu entry Neural net -> Neural Network).

In this version, the gesture sets creation can be performed manually or automatically, moreover it is possible to choose different 'seeds' during automatic creation.Once gesture sets are created, it is possible to start the training phase (Neural net -> Training and, at the end of that, try out the recognizing capabilities of the system by hand (NeuralNet -> Verify) or by using the performance monitor (NeuralNet -> Performances).Another news is the possibility to set a confidence threshold for each neural network, which could improve recognizing abilities of the system.

Gestures, gesture sets and neural networks can be saved and loaded as XML files in the APP_PATH\Gestures\ folder; the application uses the following convention (GESTURE is a placeholder for the gesture name):

APP_PATH\Gestures\GESTURE.xml: File used to store the path of the gesture, training set, test set and neural network files

APP_PATH\Gestures\GESTURE_S.xml: File containing gestures used as training set

APP_PATH\Gestures\GESTURE_VS.xml: File containing gestures used as test set

APP_PATH\Gestures\GESTURE_NN.xml: File containing neural network data

About the Neural Network

In the previous version of this article, I wrote that it could be interesting to test out the performances of a classificator based on Autoassociators, compared to a classificator based on Multilayer Perceptrons; well, the good news is that the current version can deal with this: when the neural network input number is the same as the output number, training and verify phases behave differently, allowing the identification of which classificator performs better.

Conclusions

Two libraries were provided, which could be used to implement a custom gesture recognizing system; an application, contained in the solution, shows not only a possible implementation, but even a simple program which can be used to develop and test custom recognizing systems.

Share

About the Author

I got my Computer Science (Engineering) Master's Degree at the Siena University (Italy), but I'm from Rieti (a small town next to Rome).
My hobbies are RPG, MMORGP, programming and 3D graphics.
At the moment I'm employed at Apex s.r.l. (Modena, Italy) as a senior software developer, working for a WPF/WCF project in Rome.

R/sir
will you please explaIn that how the sin/cos angles are being calculated when input is given
whether here vector to matrix conversion is done or what?
n please show where dis calculation part is done
thank u

Features are computed using basic geometric rules, between pair of consecutive points: consider the line defined by 2 points, compute the line slope (equivalent to the tangent of the angle between the line and the X axis), compute the angle and retieve cos/sin components.
See Gesture.cs, Line 4 (ExtractFeatures).

Read you code and article and learn much, but got questions.
1 in your ann solution, every gesture has an ann, compute every ann and select the best. This solution is more like ones using HMM method, not the classical Ann all models sharing an ann and with multiple outputs. Can you explain why?
2 in your demo app, all gestures in your the Gesture folder don't contain any _nn.xml files. On the other hand creature my own new gesture, after training there seems to be no _nn.xml file too. Which make the App works poorly. Hope you can upgrad your demo app and provide full gesture files, so i can see the full power of you app.
Thanks

1. When I designed this project, I wanted to minimize the time required to update the system in case a gesture was added or removed.
Consider what happens if a single ANN is used: every time a gesture is added or removed, the entire network must be re-trained and validated; depending on the number of gestures and the size of training sets, this process can take quite a lot.
Consider now the single ANN per gesture: if a gesture is removed, no action is needed; if a gesture is added it is possible to either train just the new gesture ANN or train all ANNs in parallel. Moreover, the ANN used to recognize a single gesture is simpler than the ANN used to disciminate among the entire gesture set.

2. Regarding the missing ANN files in the demo: the complete demo was too big to be uploaded on codeproject.
I created an archive with a possible ANN configuration (MLP), so you can play with it. Performances are not so good, since I used few imputs (10 points per gesture) and didn't tune everything. You can download the archive from this link.

I have tipically used a 2-layer setup (hidden and output) with at least 24 inputs (remember that features are sine/cosine couples, so it is better to use an even number of them, and half the number of inputs is more or less a measure of the resolution of the gesture), 12 nodes in the hidden layer (half the number of inputs) and 1 or 24 outputs (1 output = MLP mode, same number of inputs = auto-associator mode). Regarding the activation function, I tend to prefer a Sigmoid or Bipolar Sigmoid for the hidden layer. Note that when using an auto-associator setup, the output layer nodes require a function able to provide values in the [-1;1] range (Bipolar Sigmoid or Hyperbolic Tangent), while for MLP you are free to use whatever function you prefer, but it is tipically better to use a Sigmoid.

It is quite important to provide a good training and test set, so I usually produce 10 samples by hand and then auto-generate more samples using hand-drawn gestures as seeds (usually 100 auto-generated gestures per sample). I use the same approach to generate the validation set.

When training your network, you can decide to use or not negative samples, but note that in an MLP configuration they are necessary, while in auto-associator mode they could help the training phase.

Regarding the values of epochs, error threshold, momentum and learning rate, you can first try with default values, and try to tune them if you see that the training is stuck in a local minimum (the error function stops decreasing too early and does not reach the expected error limit).

Thank you for the support, I have one more question;
Is it possible to don't take in account the rotation, I mean detect the same result if i write "M" or "W"? is there something to do in the code to change it?

Well, I find it difficult with the current way features are computed.
At the moment the neural networks input is the relative sine/cosine couple between two consecutive points. This means that NNs take into account both rotation and orientation of the gesture (i.e. a vertical line starting from the top is different from another one starting from the bottom). If you need to try to recognize even rotated paths, you need to change the way features are computed... the real hard part is to define what features you should use (in other words, you need to determine a feature set which is invariant to rotation).
A possible starting point, but this is a wild guess, could be to compute the center of the gesture, and retrieve the angles between the center and every point of the gesture... if you consider a square, and rotate it by 45°, the angles between its center and corners in the two cases are offsetted by 45°, so there is a clear relationship between the 'straight' and rotated square that could be used by an NN to match the shapes. Unfortunally, this holds true only if the orientation of the path is the same, otherwise the sequence of angles is reversed (consider a square drawn clockwise and another counter-clockwise) and the correlation is less apparant. So you still need to take that into account...

Anyway, you can change the code in this article and define your own algorithm to extract features, since there is a specific function for this, just make sure to normalize the inputs in [-1;1].

I cannot re-upload the exact set I uploaded before, since I cannot access the files at the moment (I've moved from my previous apartment recently). I will try to re-create a meaningful set and upload it tomorrow... I hope...

I used different shapes for gestures (i.e. left arrow, right arrow, CW circle, CW square, CCW circle, CCW square, hourglass, Z, M and Bolt), and used a simple MLP to try to recognize them. Included is a relatively large training set (1111 samples per gesture, 11 created manually, the rest auto-generated).

The zip file contains the SRC version of the software provided by this article, and the gesture files (comprehensive of the trained NNs).

I've been experimenting with the solution and I was wondering if you'd tried much in the way of reducing the number of points for each gesture in order to improve performance?

I think that ultimately the further you break down the stroke from a series of points into specific features such as lines and curves the better your neural network will perform at recognizing those features. If this simple feature minimization proves useful then perhaps it would be useful to decompose each gesture into individual line segments and curves but I haven't gotten into the neural network code far enough to see what is really happening.

For example, calling the MinimizePoints function below with a minimum delta of 1.0 on all your gestures (ideally on mouseUp event) can significantly reduce the number of points on simple strokes such as <> / ^ etc.

Thanks for sharing your suggestions!
As you could see, I had a function to minimize the complexity of gestures which worked quite like yours, but had a different purpose.
Instead of simplifying the gesture, I needed to modify it to fit the neural network inputs; to achieve that, the function was able to increase the number of points (splitting long segments in a pair of shorter ones), or decrease it (merging a couple of short segments in a larger one).
This way I could even test if (and verify that) a simpler representation could lead (as one would expect) to better recognize performances.

As you stated, this approach works quite well with simple line-based gestures, but can become troublesome for curve ones (expecially with self-intersecting shapes). I even thought about a better parametrization (maybe using splines), but in such a case it could be hard to find a good feature representation (using relative angles is quite simple, given points, but what about spline coefficients?).

If you want to experiment with it a bit, try using a neural network with few inputs (10 inputs means that just 5 points will be used), you will see performances increasing drastically with line-shaped gestures, and decreasing (for exaple) with squares and circles (a circle composed by 4 points can be easily misunderstood for a square).

About the 3.5 porting, I was considering the idea of re-writing everything from scratch, using WPF instead of WinForms... I hope to have time to do that... could be a nice exercise

Thanks for your explanation--I was having trouble recognizing any detailed shapes because my minimized shapes were being further reduced to match the size of the neural network.

I had checked out your method for expanding or shrinking the point list to match the neural network but the difference is that mine basically breaks the gesture down into line segments. A gesture like > would ideally just need to contain 3 points to connect the two outer points to the middle point at an angle.

To do this we iterate over the line and generate an ideal midpoint for every point along that line (i-1 and i+1). Wherever the distance between the actual midpoint (i) and the ideal midpoint is less than the minDelta parameter we know that we can safely remove that point without causing a loss of information.

For 3.5 porting all I had to do was right click the project and target the .NET 3.5 framework so I could use LINQ or extension methods.

I'm thinking that this might work well supporting a visual studio 2008 mouse gestures addin which you might see on codeproject soon =)

Hi, I've been looking over your source code, and it is very interesting and neatly made. Great job, man! Anyway, I was going to ask you about some errors the program.cs from Test project throws at me. The lines in question are: