can anybody give me a hint where I can find tokenizers for French and/or
English text? Even rather simple scripts (e.g. perl) would be helpful!
(Please don't recommend scripts splitting on white space only though ... )