Recognition of Handwritten Gestures

Did you play the famous Black & White by Lionhead Studion? Remember the magic glyph system, where you could cast a spell by simply drawing a symbol in the air? Or if you did not play that
game, imagine being able to control your game by drawing. Draw a snowflake - and your heroes are protected by an impenetrable Wall Of Ice. Draw another magical symbol - and you summon a mighty demon
from The Lower Planes. No menus or other unrealistic interface gimmicks, what you see is what you get. Or do you really think that actual mages employ pop-up menus in their spellcasting?

Just a few strokes

So how do you do it? How do you teach a computer to recognize human handwriting? Well, the general task is very complex one, that sometimes confounds even us humans (recall that note your doctor
asked you to bring to the pharmacist!). But we don't want to recognize each and every handwritten note under the Sun, we simply want to recognize a few simple figures (also known as gestures)
of our own design. This is a way, way simpler task, a task that can actually be handled by a computer in real-time!

First of all we need to create and store the master templates for our gestures. We'll choose the simplest possible way: as a sequence of strokes, each of which is a simple sequence of 2D points.
Something like that:

(Note that I used STL containers that store actual values, as opposed to pointers. This is done only for the purpose of exposition. In the actual production code you should
probably use pointer lists - this will save you a lot of time wasted on redundant copying)

Then we can write a simple program to input the gesture with a mouse and store it in a database (I'll leave it to you as a home exercise.)

But now we run into a following problem: some people write small, some people write big, some people write slowly and carefully, some people draw with fast sweeping motions, some people will write
in the center of the screen, some people will write somewhere in the corner. The end result would look pretty much the same to us - a gesture is a gesture is a gesture - but for the computer these
would look like completely unrelated sets of coordinates. What does it mean? That means we need to normalize the strokes.

Making things look normal

The normalization proceeds in 3 stages:

First we need to scale thegesture to a predetermined size, say 1 unit

Second, we need to put the individual points in the gesture at a uniform distance, so that it doesn't matter how fast or slow the gesture was drawn.

Finally, we need to center the gesture around (0.0).

The first step is pretty straightforward: we just find the dimensions of the gesture and then scale accordingly. For instance it can go like this:

Now we move to a somewhat tricky part. We need to go through each stroke and figure out how close its points are. If they crowd too close, we need to space them out. If they stand too far apart,
we need to insert new ones. At the end we want each stroke to contain a fixed number of points (say, 32), all equally spaced apart. For this we need to measure the length of the stroke, then walk
along it, marking (adding a point to a new stroke) every 1/31 of the length. Here's the code:

Finally, we want to center the gestures at the origin (0,0). This is very easy: we calculate the center of the gesture using the standard arithmetic-mean technique (add up all coordinates and
divide the sum by the number of points). Then we move each point of the gesture by (-centerX,-centerY). This will, obviously, bring the center right into the origin.

Now that we have our gestures nicely scaled, centered and arranged in a fixed number of points, we can do our main magic trick: shape matching.

Shape Up!

How would we like our magic shape matcher to work? I'd be nice if it would compare the gesture, drawn by user to one of the stored gestures, then it'd return a score: say, 1 for a perfect match, a
number, slightly less than 1 for a similar shape and a small or even negative number for a very different shape. We can then compare the input to each of the gestures in our database, pick the one
with the highest ranking, and if the ranking is high enough, we say "That's what it is!"

OK, it's easy to say, but how do you actually do it? Well, look closer at the requirements for the shape matcher, does it remind you of something? No? How about good old dot product? If you give
it two normalized vectors it'll return 1 if the vectors are exactly the same, it will return a value slightly less than 1 for vectors that point in more-or-less same direction and it'll return a low
or even negative number for vectors that point in wildly different (or opposite) directions. Amazingly enough, it works just as well for shape matching! The only difference is that our vectors are
not 2 or 3 dimensional, but 2*N dimensional, where N is the number of points in the gesture. Otherwise it's the same familiar x1*x2 + y1*y2 +... (Professional statisticians call this
formula correlation; now you know that's the good old dot product in disguise)

So how exactly do you use dot-product to compare shapes? You simply unroll each gesture into an array of numbers and manipulate these arrays as if they were familiar geometrical vectors. (What if
the arrays turn out to have different dimensions? Well, then you know it is not a good match, for differing dimensions mean different number of points, and a different number of points means
different number of strokes in corresponding gestures, which allows us to reject the match right away. This is why we made each stroke precisely 32 points long.)

Note: In real life, we don't have to calculateGestureDotProduct()with itself for all the template gestures in the database. We can easily pre-calculate and
store these values.

Conclusion

This is about it. The rest is database management, user interface and other maintenance tasks that you can easily provide yourself (and which will vary greatly from game to game anyway). If you
wish to play around with an already working gesture recognizer, you can download one)

The algorithm is fairly simple and flexible, it even can be easily trained to recognize individual letters and digits. However it is by no means universal - it will not recognize cursive
handwriting, it'll get confused by symbols with a variable number of strokes, such as "t", sometimes it'll even have hard time telling a circle from a square (this is because a circle drawn starting
from the top and a circle drawn starting from the bottom look like entirely different shapes to the computer). Such is the price of simplicity.

[This spatial dependency can probably be overcome by sorting the gesture-vectors into clockwise or counter-clockwise order from a specific position, such as from top-right. Overcoming
variable-stroke gestures such as "t" would require more work, but a spatial sort can at least improve reliability on symmetric shapes. - Ed.]

Nevertheless, I believe that this technique can be used to make you game interface more immersive, more intuitive and overall more fun. Which is always a Good Thing(TM).