Learn About Unicode

Here is very brief overview for those confused over ANSI and Unicode. I'm
sure you will find better descriptions via a
google.com search.

ANSI (Single Byte)

ANSI is normally a single byte encoding where 256 character codes (0..255)
define all available characters for a language. For a single language the ASCII
table of 256 characters can normally hold all available characters.

ANSI (Double Byte)

Japanese, Chinese and Korean languages have much more than 256 characters so
these languages use a mixture of single and double byte character codes. Here
the primary characters (0..127) are English chars. The extended characters
(128..255) can contain codes that link you into other 256 character tables. With
a double byte char the first character defines which 256 character table to use,
while the second byte is an index into that table.

Code Pages

256 character codes is not sufficient to represent all characters for all
languages. To get around this problem Windows uses different character tables
(Code Pages) for different language groups. The first 128 ASCII characters are
common to all Code Pages and contain non-printable and English language
characters. The extended character codes (128-255) point to different characters
for different code pages. As we saw above extended codes for Japanese, Chinese
and Korean may contain special byte codes that point to other additional
character tables, thus allowing a codepage to support more than 256 characters
using double byte codes.

In Windows 2000/XP/2003 you can set the Windows language (codepage) via Control
Panel > Regional and Language Options. Some characters may not display correctly
if the current font is not compatible with the current code page.

Unicode

Windows Unicode (UTF-16) uses 2 bytes to represent each character. 2 bytes
(16 bits) (256x256=65536 codes) provides enough char codes to represent all the
most common world characters. UTF-32 has an even larger capacity however most
Windows application such as MS Help 2 work in UTF-16. So with Unicode you don't
need to change the system Code Page to view documents of different language.
Also single
document can contain a mixture of languages if the application allows it.

UTF-8 Unicode contains a mixture of single and multi-byte characters.
Some character codes in the range (128-256) are used as lead-bytes to mark the
start of multi-byte character codes. Using two or more bytes per character
provides plenty of room to represent all the commonly used world characters.
Documents encoded in UTF-8 can often be used by legacy software and hardware
where Unicode (UTF-16) cannot.

Unicode UTF-16 and UTF-8 are now fully supported by Windows 2000 and XP.
Although the future is Unicode, Windows will continue to support ANSI and
Code Pages for legacy applications.

Unicode Files

Windows recognizes a Unicode file primarily by its file signature (lead
bytes).UTF stands for Universal Character Set Transformation Format.

Most FAR H2 Editors and MS editors (Notepad, MS FrontPage, MS Word) under
Windows 2000 and XP
will allow you to successfully change the file encoding as long as the Windows
default language (code page) matches the language of the file (see section above). This is usually done via the File >
Save As dialog.

Open the file in FAR Hx? Editor, or Windows 2K/XP Notepad editor.

Select "File > Save As" dialog.

Select the new encoding.

Click the save button.

The following FAR windows display
encoding setting at the bottom of the "File > Save As" dialog (same
as MS NotePad does in Windows 2000 and XP):
The FAR
H2 Project Editor; Toc & Index Editor; as well as some special Hx? editors available
from the H2 Project Editor.

Tip: If you select the correct file encoding when you create a project, then all
other associated Hx? project files you create will also use that encoding. If
you change the encoding of the HxC project file in the H2 Project Editor, then FAR
will ask you if it should also change the encoding for all associated .Hx?
project files when performing a File Save.