On Thu, 29 Jan 2004, Yuval Yaari wrote:
> And this time Gabor's going to come, so we'll have some stuff to eat! Yay!!!
> Just kidding. I'll try to remember to bring some "burekas" too.
>> Shlomo, if you're reading this and you'll have time, I would love to
> hear about:
> - On the fly change of encoding (ASCII <-> UTF-8)
> - On the fly change of "encoding" (logic <-> visual) (I must have such
> Apache handler...)
> - Do we have to write the same regular expression twice (or more) just
> because of encoding issues (i.e: one for ASCII and one for unicode)???
> - Any problems with Perl & Hebrew (Does using Hebrew strings as hash
> keys work, for example?) ?
> - Hebrew scalar names. Some of us do not know English you insensitive clod!
> - How to spell our names in Hebrew?
I'll address all this and more while demonstrating a hack
system build in a few hours skattered over one week, which
automatically extracts and downloads news from a Hebrew
online source, extracts the text,tokenizes it and converts
it to some XML format for further processing by other
software.
As encoding issues, character sets and other technical
issues such as (can I even edit unicode text in my text
editor) will serve as the base of the lecture, while the
system will be used as a frame which combines all these
small topics into one useful real-live example.
Hopefully the lecture will allow me also to get some new and
fresh look at the system and some decisions in it,
especially due to the fact that it has been built in a
hurry.
--
Shlomo Yona
shlomo at cs.haifa.ac.ilhttp://cs.haifa.ac.il/~shlomo/