Cibu_johnnt@3com.com wrote:
<snip>
> As I understand Unicode, it is trying to represent a text in its deep
structure
> and it is the job of the font to convert that deep structure to surface
elements
> or actual glyphs of the text. This is what exactly transliteration also
trying
> to do (atleast in case of Indic scripts). Finding out the rules to do
this
> conversion is the core of both. What is being remaining is, assigning
code
> numbers in case of Unicode or assigning correpsonding Latin character
sequence
> in case of transliteration. Both are reasonably trivial. So my questions
are:
>
> 1. Is my theory correct ? If not, in which way ?
> 2. Are these rules for conversion between deep structure to surface
structure
> documented somewhere, in case of Malayalam ?

Using the encodings of the Indic scripts, it's pretty easy to implement a
transliteration
function between Indic scripts. Transliteration from the Indic scripts to
Roman
isn't too difficult either. However, there are a number of situations where
it becomes
extremely awkward. For example, converting from Roman to the Indic scripts
is
quite difficult.

The real issue here is the separation of visual elements and the raw data.
It sounds
like you want to process text on the screen, and accept keyboard input on
data
that have been pushed through the transliteration process. What this means
is that
you need to track where the encoded character boundaries are located and
map
these to the Roman syllables, with correct cursor movement - not a simple
task.
This is a function that should be separated from the internal text and
incorporated
into whatever rendering engine you use.

However you look at it, this is beyond what the Unicode Standard provides.
If you
want character encodings, Unicode provides this; for visual elements and
the
behavior of your cursor keys, CDAC's implementation in Leap is probably
the de facto standard you should follow. I believe this is well documented
in
their manuals.