Confessions of an Oracle Database Junkie - Arup Nanda
The opinions expressed here are mine and mine alone. They may not necessarily reflect that of my employers and customers - both past or present. The comments left by the reviewers are theirs alone and may not reflect my opinion whether implied or not. None of the advice is warranted to be free of errors and ommision. Please use at your own risk and after thorough testing in your environment.

Pages

Tuesday, October 04, 2011

Unicode Migration Assistant for Oracle

When you want to convert a database created in the default characterset to a multibyte characterset, there were two basic approaches - the safe export/import and the not-for-the-faint-of-the-heart alter database convert internal. In either case you had to follow a string of activities - checking the presence of incompatible values by running csscan, etc.

There is a new tool from Oracle to make the process infinitesimally simpler - Migration Assistant for Unicode. It's a GUI tool that you can install on the client. A server side API (installed via a patch) does all the heavy lifting with the client GUI providing a great intuitive interface. You have the steps pretty much laid out for you. But the main strength of the tool is not that. There are two primary differentiators for the tool.

When you do have a bad character, what can you really do? You can truncate the part of the data. But how do you know how much to truncate? If you truncate aggressively, you may shave off a chunk and lose valuable data; but be miserly and you risk having the bad data in place. This tool will show the data in a separate window allowing you to correct only the affected data; nothing less, nothing more.

When users copy and paste data from some unicode compliant system to Oracle, e.g. from MS Word to a VARCHAR2 field in the database, the characters may look garbled; but given proper characterset they become meaningful. This tool allows you to see the data in many charactersets to identify which one was used to create it in the first place. After that it's a simple matter to reproduce that characters in the proper characterset.