Importing Special Characters

My app operates in English using the standard character set but occasionally includes foreign words requiring characters from other character sets (Latin, Greek, etc. -- no double-byte stuff)

We are expecting to be able to create the string in Word and then paste it into the browser. What happens is variable, but never complete success. In some instances the last character of a string of randomly selected special characters does not get saved; in some, none of the string gets saved

I have tried pasting sometimes into the htmlarea control and sometimes into the html textarea itsslf and then switching back and forth to see what gets pasted or saved, etc. and no combination seems to work reliably.

But it is not a complete failure because, once at least, most of the characters I wanted were saved.

I have tried with this.killWordOnPaste = false and this.killWordOnPaste = true

Change History (5)

What character set are you using on the page that Xinha is loaded on, and what character set is the form set to use, and what character set are you using in Word?

I'd recommended using utf-8 encoded Unicode all the way, if you can try that and see if it helps. It does mean doublebytes for some of those previously singlebytes but it's going to be more reliable.

Unfortunatly some of the language files for Xinha (inherited from htmlarea) are not utf-8, this will be changing once the new L10n system is in place (and old localisations get replaced). It's just to finicky to work with multiple character sets in my opinion, simply due to the different encodings and the lack of a real ability to translate characters between them.

Gotta admit I don't know what character set I am using -- whatever is the default, I guess. But I know that's not helpful. How do I tell? Is there a setting on my webpage or in htmlarea?

When importing from Word e.g., the following character ?? from Latin Extended-A it sometimes gets saved surrounded by font tags and sometimes not (sometimes only a portion of the opening tag and nothing else, not even anything representing the character) But here is sticks, so not sure what I have to do to get it to work in my app

I think you'll find this is a character set issue. Can you try with the example (latest nightly would be best) to see if the problem is duplicated. If so please attach an example word doc you are copy-pasting from.

This drove me crazy when I was adapting htmlarea 3 to be a WYSIWYG for a course management system (Jones Standard) and I finally fixed it. I had tried everything, including using hex codes, etc., but no matter what the characters got through.

This is what worked for me in htmlarea3:

First open htmlarea.js in notepad and save it as UTF-8.

Next, within the function HTMLArea.htmlEncode = function(str) I add a search and replace for every character you need using the actual character, not the hex code, which doesn't work for more than the most common symbols. A very few characters may appear as rectangles, they will not be encoded. Everything else will became htmlentities and stay that way. Make sure that you don't use "gi" in your regular expression, because any entities that are case sensitive will change to uppercase.