Links

Tags

Recent tweets

Full reverse: A REVERSE function that handles unicode

Background

This week a co-worker converting an Oracle application to DB2 contacted me and requested a REVERSE function.

REVERS() is a function which reverses strings by reordering all the characters in reverse order.

For example:

'Hello World!'

becomes

'!dlroW olleH'.

Not a terribly useful function within an application, generally speaking.

However reversing strings can reduce contention on index pages when inserting rows into a table in ascending order.

At any rate, I have learned over the years that conversions are much more successful when one does not demand the customer to redesign whatever it is they have been doing for reasons that may elude me.

REVERSE written in SQL

It is fairly trivial to write a function in SQL PL which walks an input strings and spits it out in reverse order.

However, such a function, if written in inline SQL PL will be effectively limited to non DPF scenarios since inline SQL PL executes on the coordinator.

In fact my coworker had written such a beast and found it to be too slow for that reason.

Writing a function in compiled SQL PL won't solve the DPF problem and would make it likely even slower.

So the first question is:Can the function be written as SQL without the need for BEGIN.. END.Here is one possible solution using recursion:

CREATE OR REPLACE FUNCTION REVERSE(INSTR VARCHAR(4000))
RETURNS VARCHAR(4000) SPECIFIC REVERSE
DETERMINISTIC NO EXTERNAL ACTION CONTAINS SQL
RETURN WITH rec(pos, res) AS (VALUES (1, CAST('' AS VARCHAR(4000)))
UNION ALL
SELECT pos + 1, SUBSTR(INSTR, pos , 1) || res FROM rec WHERE pos <= LENGTH(INSTR) AND pos < 5000)
SELECT res FROM rec WHERE pos > LENGTH(INSTR);
/
VALUES reverse('Hello World!');
ERROR near line 1:
SQL1585N A temporary table could not be created because there is no available
system temporary table space that has a compatible page size.

OK, we need to have system temp >4KB to hold teh temp table for the recursion.

You may have noted that I named the C function ReverseSBCP for "Single Byte Code Page".The problem is that the function, as implemented, only works if we use a single byte code page.In a Unicode database, which is default in DB2, this would only work for ASCII characters as we can easily see when we add some German umlaute:

REVERSE written in C with Unicode support

In Unicode UTF-8 which is what is being used in a DB2 Unicode database for the VARCHAR data type every character is between one and 4 bytes long.
The first few bits of the first byte encode how many more bytes are needed to encode the entire character. That means our implementation will look like this: