US-ASCII (again)

Martin Wallgren

Greenhorn

Posts: 6

posted 8 years ago

Hi all,

I've read this forum for a while and it helped me allot when I took the SCJP exam in November. I've been working on the SCJD off and on for the last few weeks, and after all the reading on this forum I finally decided to register. I'm working on the exam as a full learning experience and I'm testing out all the crazy ideas I have in my head (I'm guessing I'll have rewritten all the important classes quite a few times when I'm done).

Here's my questions.

I'm currently making a decision on how to approach the to all familiar character encoding. I've read some threads about it in the forum and here's my thought so far.

When I'm converting a String to byte[] for writing to the db file this is what I had planned.

My option to this is at the moment using

And that line is obviously shorter than my loop above and the motivation is that it gives me an IllegalArgumentException if there are any illegal characters in the String. The exception allows me to give some feedback to the user about the issue.

the easier one line conversion adds to the codes simplicity (those junior programmers aren't the brightest stars in the sky) but it will store any faulty characters as ? and just ignoring them feels wrong straight down to the bone.

In the real world it's obviously unacceptable to restrict text to US-ASCII, so I'm guessing that this is something stipulated by the SCJD exercise.

Either all input is US-ASCII (in which case there will be no problems with the conversion) or it's not (it sounds as if that's the case here) - in the latter case there is no meaningful content-preserving transformation using either of these methods. The way to store non-ASCII content in an ASCII database would be encode the data using base-64 or a similar encoding that transforms binary data to ASCII. That can be easily reversed for display purposes, but it would no longer be easily possible to use SQL operations (like comparisons) on the stored data.

Martin Wallgren

Greenhorn

Posts: 6

posted 8 years ago

Originally posted by Ulf Dittmer: in the latter case there is no meaningful content-preserving transformation using either of these methods.