Transporting Oracle CHARs over a dblink

At work, we’ve got two Oracle databases and a Java web application. One of the tasks of the web application is to copy tables over from one database to the other. Last week we ran into a funny problem which turns out to be an Oracle bug.

I’ve posted a question about this last week on stackoverflow.com but it seems that not many people are affected by this particular problem, or at least not many people know the answer to this particular little problem. We set out to tackle this and we did.

The Setup We have two databases, and want to transfer a complete table from the remote database to the local database, using a database link and a “create table as select”. For both databases, we can find the version and used character set with a few simple queries:

select * from v$version where banner like ‘Oracle%';

select value from NLS_DATABASE_PARAMETERS
where parameter=’NLS_CHARACTERSET';

The results in our setup are:

Version

Character set

Remote:

Oracle 10g Release 10.2.0.3.0

WE8ISO8859P1

Local:

Oracle 9i Release 9.2.0.6.0

AL32UTF8

As you see, we transfer data between different versions of Oracle, and different character sets. Add the CHAR datatype into the mix, and an annoying little bug pops up.

The Test Data On the remote database, we create a table and fill it with some fake data, like so:

Everthing seems to be ok, and selecting the new data on the “bar” table seems to work. But there’s a snag. When using JDBC to connect to the database, you might notice that the CHARBYTE column is padded with 2 spaces. Why? First we look at what happened to our data types:

select column_name, data_type,
data_length as bytes,
char_length as chars
from user_tab_columns
where table_name like ‘BAR’

COLUMN_NAME

DATA_TYPE

BYTES

CHARS

CHARBYTE

CHAR

3

3

CHARCHAR

CHAR

4

1

VCHARBYTE

VARCHAR2

3

3

VCHARCHAR

VARCHAR2

4

1

You may notice something funny going on here. As mentioned earlier, the remote database uses the WE8ISO8859P1 character set, which uses 1 byte to store a character. The local database uses the AL32UTF8 character set, which uses 3 to 4 bytes to store a character. So in order to copy characters over Oracle reserves 4 bytes per character (see CHARCHAR and VCHARCHAR columns).

For the data types we defined the size in bytes in, Oracle reserves 3 bytes for some reason. This is strange, because the AL32UTF8 character set has some 4 byte characters in it which never can be stored in those columns. Suppose Oracle says “but you defined length in bytes”, it still doesn’t make sense. Why would I not choose to do conversion, but still reserve 3 times as much space for each byte? But wait, that’s not all.

Lets look at the length of the datatypes in characters (which makes much more sense). When we define the length of the original column in characters, oracle converts the data properly, as you see for the “CHARCHAR” and “VCHARCHAR” columns. But, for the columns we defined size in bytes, Oracle screws up. Not only the the number of bytes gets scaled up, but the number of characters also gets scaled up, and your content is padded with spaces. It seems as if Oracle has given us a 3 character column with the WE8ISO8859P1 character set in a AL32UTF8 database, but I did not do tests to prove this.

Oracle decided to handle the datatypes differently depending on how you defined it’s length. Although I find that very obscure, the only difference in behaviour I would expect is that if I define size in bytes, it should stay that size in bytes in the new character set, because I was “planning to store bytes”.

So defining the size of character-based types in characters results in predictable behaviour. Defining the size of character-based types in bytes makes Oracle go loopy.

Conclusion:
Stay clear of the CHAR datatype, unless you really have to. If you want to use the CHAR type, make sure you always specify the size in characters, not in bytes. Not only will it prevent you from having this problem, it will also make your database scripts more concise and readable.

P.S.
Make sure you clean up both databases and the link:drop table bar;
drop database link foo_link;

Share:

Related

This entry was posted on Tuesday, November 4th, 2008 at 22:56 \22\Nov\UTC +0000 and is filed under Software. You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.