Tuesday, November 30, 2010

Java Applet CP-1252 Linux

Update: 1/7/2011

My solution is right in the JavaDoc. I take in a bunch of bytes, but never specify what format their in. Java makes an assumption of what format they're in depending on the platform.

Even worse, I initialize the SimpleDoc(rawCmds.getBytes(), docFlavor, docAttr); without specifying what format the Bytes were in. This needed to change to new SimpleDoc(rawCmds..getBytes(charset), docFlavor, docAttr);

.. I need your help to solve this: jZebra seems doesnt have ability to print-out extended ASCII (char 129-255) in Linux - but, it's OK in Windows. I guess, that jZebra is working only in ISO-8859-1 character encoding which supported by Windows OS only - while Linux using UTF-8 character encoding. If I print vertikal line using extended ASCII character, jZebra will raise weird character on paper. So, how to solve it?

It took me a few hours to investigate, but in short, you are not using ISO-8859-1 encoding (If you look up the chart, your characters do not appear on ISO-8859-1). You are actually using CP-1252, which is often mistaken for ISO-8859-1.

Since Linux Kernel 2.6.x, it seems the handling of CP-1252 characters has changed. It looks like there may be a command line setting that may fix it based on the linked article. I'd be happy to issue this command via jZebra parameter if you can prove to me that it works.

As far as Java's default encoding is concerned, it's platform dependent. Conservatively, one would need to specify -Dfile.encoding=Cp1252at the command line, but of course that is not available via the web browser.

The best recommendation I could find was from Edward Grech (taken from JVM™ Tool Interface) where he recommends creating an ENVIRONMENT VARIABLE called "JAVA_TOOL_OPTIONS" and set it to "-Dfile.encoding=Cp1252", which the JVM should pick up each time it is started.

Last but not least, you can try to use unicode values directly from the CP1252 chart instead of allowing converting it in the browser.

-Tres

Some Additional Information (copied from their respective sites):

CP-1252

Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages.

It is very common to mislabel Windows-1252 text data with the charset label ISO-8859-1. Many web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 characters in order to accommodate such mislabeling but it is not standard behaviour and care should be taken to avoid generating these characters in ISO-8859-1 labeled content. However, the draft HTML 5 specification requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.[1]

LINUX + CMAP

The character 0xA1 in cp437 is an accented vowel which is not correct for this code in latin1. So cmap is informing the console

driver to react as if the character request were for 0xAD. The console driver goes into the unimap (straight-to-font) and reads the unicode at position 0xAD. This happens to be U+00a1, the inverted exclamation mark. Next stop is the font where the glyph for U+00a1 has to be picked up. In the end, we had a request for 0xA1 but we did not get the character at that position in cp437, we got the inverted exclamation mark for the position 0xA1 in latin1. Our cp437 is behaving like a latin1 font thank to the cmap.

JAVA DEFAULT ENCODING

Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a JAVA_TOOL_OPTIONS variable is provided so that agents may be launched in these cases.

By setting the (Windows) environment variable JAVA_TOOL_OPTIONS to -Dfile.encoding=UTF8, the (Java) System property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err: Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8

-Tres

Eko,

How are you? I spent some more time on this, and I think I figured it out. I'm pretty certain it's character encoding and I think I found a way to overwrite it. The approach was to force UTF-8 encoding in windows to produce the same results that were seen in Linux.

jZebra 1.0.9 and higher

//applet.setEncoding("UTF-8");

applet.setEncoding("Cp1252");

applet.append("\xDA");

applet.append(String.fromCharCode(218));

applet.append(chr(218));

Essentially, I'm re-encoding each time string data is passed. This seems to work for single characters when viewing the output print files in DOS. Swinging to UTF-8 breaks the data similar to what is seen in Linux.

I've updated the software to version 1.0.9 and it is available for immediate download on the jZebra site.