On Mon, Apr 19, 2004 at 21:11:01 +0300, Shlomo Yona wrote:
> Hello,
>> I have text which is encoded in UTF8.
> The text contains various unicode characters.
> Whenever I try to run sort on the text I get an ordering of
> the tokens in the text which is not the lexicongraphic
> ordering of Hebrew characters.
>> What I think is happening is that the sort doesn't see the
> unicode characters but instead it sees bytes and therefore
> sorts according to plain ASCII lexicongraphical order.
NAME
locale - Perl pragma to use and avoid POSIX locales for built-in opera-
tions
SYNOPSIS
@x = sort @y; # ASCII sorting order
{
use locale;
@x = sort @y; # Locale-defined sorting order
}
@x = sort @y; # ASCII sorting order again
See perldoc locale and perldoc perllocale, and set your LANG environment
variable, methinks. IIRC there's something like the 'he_IL.UTF8' locale
on most GNU operating systems.
Alternatively you can probably 'use utf8' for literals if your perl is
old, and make sure the data you insert is (using perlio layers, etc)
indeed UTF8 strings in perl. A recent perl should work as you expect,
unless you 'use bytes'.
Ciao!
--
() Yuval Kogman <nothingmuch at woobling.org> 0xEBD27418 perl hacker &
/\ kung foo master: /me whallops greyface with a fnord: neeyah!!!!!!!