How do I convert a Matlab string to a wchar_t string in a C-Mex file under Linux?

I try to access file in a C-Mex and want to consider Unicode characters in the names also. Under Windows the conversion seems to be trivial: Just copy the bytes of the mxChar vector to a wchar_t vector and append a L'\0'. Both mxChar and wchar_t are actually UINT16.

Under Linux wchar have 4 bytes and a specific conversion is needed. I thought the C-function mbstowcs would do the conversion, but I can try it on Windows only: There mbstowcs stops after the first character of the Matlab string.

On Jan 18, 5:28 pm, "Jan Simon" <matlab.THIS_Y...@nMINUSsimon.de>
wrote:
> Dear readers,
>
> How do I convert a Matlab string to a wchar_t string in a C-Mex file under Linux?
>
> I try to access file in a C-Mex and want to consider Unicode characters in the names also. Under Windows the conversion seems to be trivial: Just copy the bytes of the mxChar vector to a wchar_t vector and append a L'\0'. Both mxChar and wchar_t are actually UINT16.
>
> Under Linux wchar have 4 bytes and a specific conversion is needed. I thought the C-function mbstowcs would do the conversion, but I can try it on Windows only: There mbstowcs stops after the first character of the Matlab string.
>
> The documentation of the Mex-API function mxArrayToString tells, that "it supports multibyte character sets", but the output is a char*. Would this be correct then:
> wchar_t *WStr;
> char *Str;
> Str = mxArrayToString(prhs[0]); // prhs[0] is a Matlab string
> mwSize Len = mxGetNumberOfElements(prhs[0]);
> WStr = (wchar_t *) mxMalloc((Len + 1) * sizeof(wchar_t));
> mbctowcs(WStr, Str, Len); // of course with checking the output
> WStr[Len] = '\0'; // terminator
>
> Or is there a direct way to convert mxChar to wchar_t strings?
>
> Google finds just a few hits for "mxChar wchar Matlab", e.g.:http://www.mathworks.se/matlabcentral/newsreader/view_thread/236507
> It was a great that TMW decided to use 2 byte CHARs. But the documentation of at least Matlab 5.3 to 2009a does not really explain, how mxArrayToString handles the 2nd byte.
>
> Thanks, Jan

The mxArrayToString() documentation is ludicrous. If I recall
correctly, it says something like "supports multi-byte strings" and
that's it, not a single example of a UTF-16 string being converted
from an mxArray to a C-string. Since the function returns a char* I've
always assumed it converts the string to UTF-8, but mentioning that in
the documentation would be very helpful.

If you want a solution that can handle all Unicode encoding cases you
throw at it (think UTF-16 surrogate pairs etc.) then I'd suggest using
the ICU library (http://site.icu-project.org/). It is open source and
very reliable when it comes to all things Unicode.

If you don't want to use an external library, a solution that will
work in most cases is to do a copy yourself, do not use
mxArrayToString().

This doesn't handle UTF-16 characters that lie outside of the basic
UTF-16 code plane (I think they're called surrogate pairs). You can
find people's solutions to handling those if you google for it, but
once again, if you're serious about supporting all such cases, I'd
point you back to the ICU library.

"Jan Simon" wrote in message <ih7mpc$q2q$1@fred.mathworks.com>...
> Dear readers,
>
> Bump. Are unicode file names too rare to catch the interest?
>
> I've found the undocumented mxArrayToString_UTF16. But I did not get it to run currently.
>
> Kind regards, Jan

If you look at the libmex.dll file, you will see hundreds of other similar functions, most of them undocumented. Following is the R2010a version. In particular, you will find:

Thanks for the helpful answers!
The ICU-lib seems to be the secure solution.
The undocumented functions are partially helpful due to the "un".

I'll send an enhancement request to TMW. If they have been so cute to decide for ushort16 characters before Microsoft, Apple and the Linux community started the ridiculous inconsistent 1-, 2-, 4-byte WCHAR implementations, it would be very nice to offer an interface to access the values of mxChar variables. The NATIVE2UNICODE function is a good start, but a documented Mex inteface is demanded also.

What is a watch list?

You can think of your watch list as threads that you have bookmarked.

You can add tags, authors, threads, and even search results to your watch list. This way you can easily keep track of topics that you're interested in. To view your watch list, click on the "My Newsreader" link.

To add items to your watch list, click the "add to watch list" link at the bottom of any page.

How do I add an item to my watch list?

Search

To add search criteria to your watch list, search for the desired term in the search box. Click on the "Add this search to my watch list" link on the search results page.

You can also add a tag to your watch list by searching for the tag with the directive "tag:tag_name" where tag_name is the name of the tag you would like to watch.

Author

To add an author to your watch list, go to the author's profile page and click on the "Add this author to my watch list" link at the top of the page. You can also add an author to your watch list by going to a thread that the author has posted to and clicking on the "Add this author to my watch list" link. You will be notified whenever the author makes a post.

Thread

To add a thread to your watch list, go to the thread page and click the "Add this thread to my watch list" link at the top of the page.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

About Newsgroups, Newsreaders, and MATLAB Central

What are newsgroups?

The newsgroups are a worldwide forum that is open to everyone. Newsgroups are used to discuss a huge range of topics, make announcements, and trade files.

Discussions are threaded, or grouped in a way that allows you to read a posted message and all of its replies in chronological order. This makes it easy to follow the thread of the conversation, and to see what’s already been said before you post your own reply or make a new posting.

Newsgroup content is distributed by servers hosted by various organizations on the Internet. Messages are exchanged and managed using open-standard protocols. No single entity “owns” the newsgroups.

There are thousands of newsgroups, each addressing a single topic or area of interest. The MATLAB Central Newsreader posts and displays messages in the comp.soft-sys.matlab newsgroup.

How do I read or post to the newsgroups?

MATLAB Central

You can use the integrated newsreader at the MATLAB Central website to read and post messages in this newsgroup. MATLAB Central is hosted by MathWorks.

Messages posted through the MATLAB Central Newsreader are seen by everyone using the newsgroups, regardless of how they access the newsgroups. There are several advantages to using MATLAB Central.

Use the Email Address of Your Choice
The MATLAB Central Newsreader allows you to define an alternative email address as your posting address, avoiding clutter in your primary mailbox and reducing spam.

Spam Control
Most newsgroup spam is filtered out by the MATLAB Central Newsreader.

Tagging
Messages can be tagged with a relevant label by any signed-in user. Tags can be used as keywords to find particular files of interest, or as a way to categorize your bookmarked postings. You may choose to allow others to view your tags, and you can view or search others’ tags as well as those of the community at large. Tagging provides a way to see both the big trends and the smaller, more obscure ideas and applications.

Watch lists
Setting up watch lists allows you to be notified of updates made to postings selected by author, thread, or any search variable. Your watch list notifications can be sent by email (daily digest or immediate), displayed in My Newsreader, or sent via RSS feed.

Other ways to access the newsgroups

Use a newsreader through your school, employer, or internet service provider

Pay for newsgroup access from a commercial provider

Use Google Groups

Mathforum.org provides a newsreader with access to the comp.soft sys.matlab newsgroup