Pretraga

Transliteration easy way – Microsoft Transliteration Utility

If you are lucky enough 🙂 to have not one, but two alphabets in daily use, your regular task in programming will be transliteration – transformation of text from one script (alphabet) to another.

In Serbia, we are using Latin as well as Cyrillic alphabet (and that is not same Cyrillic as Russian one) and common task is conversion from one to another and vice-versa.

This is not too complicated request; you can easily create necessary procedures; however, there is a better way:

Microsoft Transliteration Utility (MTU) is not widely known, but very useful tool for just that purpose: transliteration. It can easily transliterate text either typed in a text box or from one file to another.

There is set of predefined translations:

Serbian Cyrillic to Latin / Serbian Latin to Cyrillic

Bosnian Cyrillic to Latin / Bosnian Latin to Cyrillic

Hangul to Romanization

Inuktitut to Romanization / Romanization to Inuktitut

Malayalam to Romanization / Romanization to Malayalam

You are not limited to above set; you can easily create your own translations, using Module Development Console:

(click on image for larger version)

Creating simple textual file, you can use full power of MTU’s parsing engine: definitions of input and output characters, rules for transliteration including definitions of new states for translation state machine.

This is not the end – you can even use MTU programmatically (although please check EULA for commercial usage):

Add reference to MSTranslitTools.DLL (it can be found in %programfiles%Microsoft Transliteration Utility)

Add using System.NaturalLanguage.Tools;

Current translation files (.tms) can be found in %CommonProgramFiles%TransliterationModulesMicrosoft

Hey Dejan…
Nice Blog you have here
And thanks for transliteration, it really helped me…
I have one more question about it.

Mogu i na srpskom al ne bi svi razmujeli / I can ask it in serbian but not everyone would understand 🙂

Anyway do you know if there is a way to import for example ‘Serbian Cyrillic to Latin.tms’ and ‘Serbian Latin to Cyrillic.tms’ in Visual Studio so that when the program is compiled it doesn’t need these files in the same folder where is .exe file…

Now code line TransliteratorSpecification.FromSpecificationFile(“Serbian Latin to Cyrillic.tms”) won’t work unless this file in next to .exe application and in the same folder with it.
And I won’t this file to be part of application itself so that I don’t need anything except .exe file.
I can make folder in VS and add these file there but how to connect them then with this code…

Hi Dejan, your blog is a revelation (the biking part too:)!
I am looking for a Cyrilic->Latin transliteration tool ready to be used on Vista 64. Can someone help please? daptation to serbian keyborad would also be of help. Thanks

Well, nothing, I guess, but installation requirements do not specifically refer to Vista 64 (only XP and W 2003)and sometimes these things don’t work on Vista, and I don’t know how to fix them (neither do I have the time:D).
I guess I’d like someone to tell me it will work before I install it. And – don’t laugh – I don’t even know if I have Microsoft .NET Framework v1.1 installed (also a requirement).
I’ll try your input keyboard link, thanks.