Author
Topic: UTF8 decoding with false exceptions (Read 5397 times)

I've experienced two different issues with the IceChat and finally revealed that both of them has the same root. I used to connect to an UTF-8 custom IRC server. The problems were the following:

1. Sometimes I didn't get all users in the channel when I logged in2. I got the users, BUT not all users have Full name in the Nicklist

First I stated with the debug window and found that all the server sent all info about the nicks, but part of the WHO answers had encoded in "Windows-1252". I discovered that in "strData = enc.GetString(readBuffer)" line the GetSting trowed DecoderFallbackException, and because of that the rest of the undecoded readBuffer forced to be decoded in Windows-1252. So because of this, the nicks with false decoded characters did not appeared in the NickList. The strange thing in this problem is that GetString throwed exception for characters THAT previously ENCODED correctly....

I examined UTF8Encoding enc and found that the object had been initialized with exception throwing capability when it tries to encode invalid character:

UTF8Encoding enc = new UTF8Encoding(false, true);

So because I found these exception false, I modified the code:

UTF8Encoding enc = new UTF8Encoding(false, false);

After this, the problems are disappeared, the nicks are encoded correctly. I know that mistakes could happen after this, but the problems are solved.

Did you have these kind of problems on a UTF-8 server? What I like to try is compile the source with MonoDevelop, and see what is the situation on Linux with Mono when the exception throw in enabled in the enc.

I compiled the source with Mono, and it absolutely doing the same thing also on Linux.

When I disable the exception throw much of problems are gone.UTF8Encoding enc = new UTF8Encoding(false, false);

I had more time to test, but unfortunately there are invalidly encoded messages even if I disable the exception throw, but not so many as before. And those are also common characters, therefore it should decode them.

Have you got any idea, what can be the problem with the decoding in this case?

strData = Encoding.GetEncoding("utf-8").GetString(readBuffer);This seems to just replace the characters with <?> chars, but the process continues.In looking at the error, it seems that it errors out on certain bytes, like D0, BC , but I am not sure if the error is truly the problem or not. But at least most of the text is translated to utf-8 this way.

Well, the decoder error still occurs, but at least it continues on as best as it can.I am not sure if the data being sent is invalid, or the method of decoding is the problem.I have found another utf-8 server which I have tested on, and I am going to test it and compare some results.

But if you could, you can send me some login information for your server, and I can test there as well.

It looks like IRC connection is not activated for fresh registered users on the server I used to connect. The main client for this server is a web based java client... so other IRC clients are restricted in this way. I will send you the account in a few days.

As I see it correctly the auto detection of encoding tries only 2 encoding if enabled.Give a try with UTF-8 and when it throws exception, the code will continue on the Windows-1252. This is a bit strange for me.

And because the autodetection is enabled by default, every packet is parsed with that 2 encoding, so putting utf-8 decoding to catch will break the connection immediately: