Since IBM acquired Datacap two years ago, there has been an enormous effort to globalize Datacap Taskmaster Capture, the intelligent document input solution, expanding its reach from English language capture to the world’s languages. With every new release, Taskmaster adds new languages, and today it supports nearly 30 languages worldwide.

Released in August, Datacap Taskmaster Capture v8.1 has added Cyrillic and Simplified Chinese to its list of supported languages. What makes these additions significant is that neither of these languages uses the standard 26 letter Latin alphabet.

Cyrillic script can be traced back to the Greek uncial script (with additions from the Glagolitic alphabet, in case you were curious). Cyrillic is one of the most-used writing systems in the world. It is the basis of alphabets used in all of Russia, as well as Serbia and Bulgaria. As of 2011, nearly 252 million people in Europe and Asia use it as the official alphabet for their national languages. About half of those people are in Russia.

Simplified Chinese characters are standardized characters that the government of the People’s Republic of China has promoted for use in printing since the 1950s. Today, these characters are officially used in mainland China and Singapore. Simplified character forms were created by decreasing the number of strokes and standardizing the forms of a sizable proportion of traditional Chinese characters. Is it an important language? Let’s see, only about, oh, 1,349,313,700 people are using simplified Chinese in print today.

Between the Latin alphabet, Cyrillic and simplified Chinese, Datacap Taskmaster Capture can recognize the alphabets used by 51% of the world’s population spread across every continent. Even at this writing, new languages are being added. But this process of globalization is fascinating, because not only does Taskmaster need to accommodate the wide variety of symbols that represent the world’s many languages, but around the world, people also use different formats for such things listing dates, telephone numbers, and currencies. This means that all the internal rules for formatting and for automated validations that Taskmaster performs to test the accuracy of data also need to be updated and adapted to each language.