It also provides a sample C++ client that tests the U2CP and CP2U
conversion functions.

Overview

Some time ago, I was put in a very new and strange situation. My company
built a new financial site based on an old skeleton site. It was necessary to
transfer a lot of data from the old database (Access 97) to the new one
(Microsoft SQL 7).

Which was the problem? - all the data from the old DB was written in Greek
code page and the new DB was to be written in Unicode. Another request from the
customer was to have online data in both sites. That means that when someone
introduces code page data in a table, this information has to be translated
automatically into Unicode data (in the other table) and vice versa.

I lost 4 days to find in Microsoft SQL transact language, a few, very simple
functions that make translations between code page and Unicode characters.
Unfortunately I didn't find them... Microsoft SQL server supports both Unicode
and code page formats, but not translation functions between them.

I was sure I would find on the Internet some very simple translation
functions or programs. After another 2 days, I gave up. I found only the
Microsoft API functions - WideCharToMultiByte and
MultiByteToWideChar - and a very huge program that uses them to
translate files.

That's why I decided to make my own functions.

Details

The WideCharToMultiByte and MultiByteToWideChar are
well documented on the MSDN. I made two simple wrappers for these functions,
with some default parameters:

bool CDialogDlg::U2CP(
LPCWSTR pUSourceData,
char pOutData[],
UINT giUDestinationCodePage,
char pErrorData[]
)
// pUSourceData = This is Unicode multibyte string// that has to be translated // pOutData = In this variable will be put the code page translated string// giUDestinationCodePage= Integer that must// contain the code page number (1253 for Greek)// pErrorData = A string that eventually contains an error.bool CDialogDlg::CP2U(
LPCSTR pSourceData,
WCHAR pOutData[],
UINT giSourceCodePage,
char pErrorData[]
)
// pUSourceData = This is code page input// string that has to be translated// pOutData = In this multibyte variable// will be put the unicode translated string// giSourceCodePage = Integer that must contain// the code page number (1253 for Greek)// pErrorData = A string that eventually contains an error.

I kept from Microsoft sample an utility function that is used to allocate
memory:

Now, if you want to test the functions, just try the C++ Dialog
client:

Change the keyboard settings in the needed language (for example, Greek) and
input some words/characters on the first edit box. It is possible to adjust the
settings of the edit box to see the needed code page characters correctly (by
default you will see some ASCII characters, but this is correct - just copy and
paste to Word to see that).

On the right edit box you have to enter the code page (1253 for Greek). Push
the CP2U button. Because the C++ edit boxes don’t know Unicode, you will
obtain some strange characters in the Unicode edit box (“± » Ζ ± ”).

To see again the code page data in the third button, push the U2CP
button.

Of course, it is possible to use these functions in many situations:

Directly from C++, in order to manipulate strings from/to files or DB.

Encapsulate the functions in a COM component und use it from other programs
or language platforms.

More...

To use them directly in the Microsoft SQL transact language, you need to
encapsulate the functions in two C extended stored procedures. For some
deployment reasons, I made 2 functions: xp_u2cp and
xp_cp2u_web.

The Microsoft SQL server gives the user, the possibility to make his own
functions. To build one:

Build an Extended stored procedure Microsoft Visual C++ project.

In the wizard, give to the function the same name like the project name.

After completing the wizard, copy the XP_U2CP.dll over to your SQL
Server \Binn directory.

Will find the function in the master database in the Extended Stored
Procedures group. Give proper rights if it is necessary!

You may drop the extended stored procedure by using the SQL command:

sp_dropextendedproc 'xp_u2cp'

You may release the DLL from the server (to delete or replace the file), by
using the SQL command:

DBCC xp_u2cp(FREE)

Access the functions from Query Analyzer or from one stored procedure:

Master.dbo.xp_u2cp

A. The Unicode to code page translation.

In the picture is a general script that proves the function:

In order to use it, you have to use the SQL stored procedure:

Exec spDu2cp N'ατηενσ ι νιψοσια', 1253

spDu2cp stored procedure is a wrapper for the
xp_u2cp extended procedure.

First parameter is nvarchar, the Unicode string and the second
is the code page. The stored procedure will print the varchar
result. In the spDu2cp stored procedure, is used the
xp_u2cp extended procedure with some settings.

About the Unicode to code page extended stored procedure parameters:

The extended procedure must be used in this way:

exec master.dbo.xp_u2cp @param1, @param2 OUTPUT, @cp
@param1= must be varbinary(8000).
This will contain the Unicode characters in hexadecimals format.
The CAST translation is needed because the extended
procedure parameters don’t know multibyte characters.
set @param1 = CAST(@u_value AS varbinary(8000))
@param2 = must be varchar(4000).
This output variable will contain the code page translated string.
@cp = must be a int.
This parameter contains the needed code page
(1253 – Greek, for example).

B. The code page to Unicode translation.

In the picture is a general script that proves the function:

In order to use it, you have to use the SQL stored procedure:

Exec spDcp2u 'gica in code page', 1253

spDcp2u stored procedure is a wrapper for the
xp_cp2u_web extended procedure.

First parameter is varchar string and the second is the code
page. The stored procedure will print the nvarchar result. In the
spDcp2u stored procedure is used, thexp_cp2u_web extended procedure with some settings.

About the code page to Unicode extended stored procedure parameters:

The extended procedure must be used in this way:

exec master.dbo.xp_cp2u_web @param1, @param2 OUTPUT, @cp
@param1= must be varchar (2000).
This will contain the code page characters
@param2 = must be varbinary(4000).
This output variable will contain the code page
translated string in hexadecimal format.
In order to use it, this must be translated in nvarchar Unicode format.
The CAST translation is needed because the
extended procedure parameters don’t know multibyte characters.
set @param1 = CAST(@u_value AS varbinary(8000))
@cp = must be a int.
This parameter contains the needed code page
(1253 – Greek, for example).

These functions have to get their data and put them directly in/from Unicode
(nvarchar)/ codepage (varchar) data tables. The string
variables in example are only for testing. Microsoft Query Analyzer SQL editor
knows only Unicode, so you cannot give a real codepage string parameter to these
functions (the editor converts the input string into Unicode format).

Steps made inside the C extended procedure:

I looked at each parameter received and I made some tests regarding their
type and dimension:

Access the functions from Query Analyzer or from one stored procedure:

Master.dbo. xp_u2cp
Master.dbo. xp_cp2u_web

Make the wrappers stored procedure for these extended procedures with
spDcp2u.sql and spDu2cp.sql SQL transaction scripts in Query
Analyzer.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

I make programming for over 4 years and extensive experience in C++, ASP, Pascal, MFC, COM+, ATL, TCP/IP, HTTP protocols, XML, XSL, SOAP and SQL.
For the last 2 years i working extensively at the background of financial sites (databases, n tier architecture).

How to retrieve unicode data from database using CRecordset. DB si SQL Server. Data in the database is a string containing romanian characters like tz. In my VC++ 6 app i get either the translated character t or some other character with code 0xDE.