Some UNICODE characters (big endian) not getting properly on debug/exe run times but we can write those characters in editor (vs 2015 community edition)

Question

I have some unicode character in a file. i use vs 2005 but it was not supporting all unicode characters. So that i started in visual studio 2015 community edition. its okay we can write all unicode characters manually in editor. (UNICODE big endian).
(These chars i stored in an array wchar_t arr[500]/new alloc. Inside those array too some unicode chars elements update is impossible, but some unicode char possible too, described below)

𠀐亙𠀃𠀃亙亙𠀐𠀐Val𪛕𨕥

The same characters above, i stored in a file. I can write those manually in editor vs 2015. But on debugging time some of the characters giving a wrong result. (𠀐, 𠀃, 𪛕, 𨕥). so i can use these chars for verifying purpose (if .. else)

eg: if(chArr[for_count] == L'𠀐') // always getting wrong result

other chars no problem to work. (already set out _wsetlocale function also)

The same time i want to write/print those 'non getting' character in a file after verification. So i wish to know about any compiler updates/editor updates/ vs new version. so that i can move successfully.

telling you that the character takes up more than 1 wchar_t entries in order to fit it. If you don't see this warning then you must have disabled warnings, this is a level 3 warning so it will be visible by default.

So this tells you that the string has length 3, the first two elements are a character and the last element is the null terminator. If you then take the surrogate pair, the high surrogate being 0xd840 and the low surrogate being 0xdc10 and convert it into
the proper codepoint, then we get:

I suggest you read up properly on UTF-16 and surrogates, and actually look up where these characters are before you carry on trying to work on text like this.

This is a signature. Any samples given are not meant to have error checking or show best practices. They are meant to just illustrate a point. I may also give inefficient code or introduce some problems to discourage copy/paste coding. This is because the
major point of my posts is to aid in the learning process.

Thanking you for the reply. I was thinking that before there why a '\0' at the end. Now i got the exact meaning. This much i didn't imagine that. I thought in other way. Like, if its LE or BE the compiler work it in the background and the user wont see what's
happening in background, i mean more than one byte but saying it is a character. That is also very tedious inner things the compiler do.

As from your opinion, i reached an idea that this much things behind. I really appreciate your knowledge and experiences. This is a very difficult task i understand.

So i guessed for making a string and more strings to a string array. After taking elements like string then compare. I think it may be fail but i will try. Once i reached the result i will post here.

My aim is: making it simple if it is difficult in background too.

1. unicode array (LE/BE/8) <--- Open from a unicode file.

2. change that array elements as per my own unicode values (LE/BE/8)

3. Sometimes repeat line 2 with another same standard unicode values manually initializing.

Thanking you for the reply. This is also profitable. I really appreciate your reply. Because i didn't know and suffering the elements from a CStringW object because of the more bytes. In ascii it is easy, unicode also easy if it not reply in
LE/BE.

Both reply's helpful for me. From first i studied about the byte length of LE practically. I was not looking to think about that before also something hided before.

And Now from you, it becomes further easy for me. Let me do it practically.

TextElementEnumerator
i seen it first time. I think its works better.

Worthy time:)

This is what i aim:

1. unicode array (LE/BE/8) <--- Open from a unicode file.

2. change that array elements as per my own unicode values (LE/BE/8)

3. Sometimes repeat line 2 with another same standard unicode values manually initializing.