L2/03-264
Date/Time: Sun Aug 17 04:00:18 EDT 2003
Contact: Mike Meir (mike@gateseven.co.uk)
Subject: Comment on Public Review Issue 9. "Bengali Reph and Ya-Phalaa"
Mike Meir
Director, Gate Seven
Introduction
Paul Nelson's public review document "Bengali Script: Formation of the
Reph and use of the ZERO WIDTH JOINER and ZERO WIDTH NON-JOINER" makes
proposals for resolving an ambiguity which exists in Bengali script
processing regarding the representation of the plain Unicode string Ra
Virama Ya, which needs to be presented in text as both the common grapheme
Reph_Ya, and the less common grapheme Ra_Yaphala.
The two representations are textually distinct: the first is the normal
representation for the Bengali conjunct consonant Ra_Virama_Ya, often found
in Sanskrit loan words, whereas the second normally indicates a shift in
the sound of the vowel of the grapheme in foreign loan words. For example,
the standard Bengali pronunciation of the letter Ra is Ro, and if the
Yaphala follows the Ra and is placed before the Vowel Sign Aa, the
pronunciation shifts to the English a, as in "rat".
While in this sense the Yaphala is perhaps functioning to the reader more
as a nukta, the textual convention is established that where a Yaphala is
found in text in positions in which it is in principle allowable for a
conjunct of Ya to be present -- i.e., where there is a preceding consonant
-- the grapheme is considered to be decomposable into its constituent parts
for the purposes of sorting. Thus both Reph_Ya and Ra_Yaphala would sort
in the same position.
Nevertheless, they are not in any sense interchangeable. To replace one
with the other would lead to the perception of a spelling mistake.
It is therefore important that plain text can distinguish these forms,
which is the objective of Paul Nelso's proposal. Nevertheless, I have
considerations with regard to his proposed solution, which I feel is
unnecessarily complex, because it follows from the application of
unnecessary reordering of intial Ra Virama strings in graphemes.
Reph Behaviour in Bengali is not the same as in Devanagari
The current rendering behaviour for Reph (the common half form of Ra), at
any rate in Microsoftâ€_s shaping engine follows the rendering behaviour
for Reph in Devanagari script is defined in "Consonant Ra Rules", p.217 of
Version 3 of the Unicode Standard.
In Devanagari, it is necessary to move the Reph (more accurately, Ra
Virama) to the end of the syllable, since it migrates to the last element
of the grapheme, which may be a post-base VowelSign such as Aa.
However, in Bengali script, Reph belongs on and to the first element of
conjunct consonants, as is clear from examination of movable type
containing ligatures including Reph.
Similarly, Reph attaches to KhandaTa (the legated half-form of Ta) as the
first element of a grapheme, and may find itself separated from the
subsequent element by a preceding reorderant vowel sign, which in Bengali
is placed after the Virama or half form, not necessarily at the beginning
of the grapheme.
In view of this behaviour, (more accurately, lack of behaviour), there is
in fact no need to reorder Ra Virama in the course of rendering Bengali
text, and to do so simply makes things more complex, ultimately for the
person who has to enter the text.
Negative consequences of reordering -- Paul Nelson's proposal
A consequence of reordering is the problem in distinguishing between the
forms Ya_Reph and Ra_Yaphala using ZWJ and ZWNJ.
If the Reph is re-ordered, this occurs before conjunct formation, so in
order to allow the formation of the Ra_Yaphala grapheme, the Ra Virama
reordering has be blocked before it can occur. The position of the Virama
and the ZWNJ and ZWJ have to be reversed compared to their normal syntax:
* Ra Virama Ya -> Reph_Ya as a ligature, if available
* Ra ZWNJ Virama Ya -> Ra Yaphala
to which are added
* Ra Virama ZWJ Ya -> Reph Ya, Reph not reordered (though one would have
thought it already had been, unless what is intended is actually Ya
followed by an explicit Reph)
* Ra Virama ZWNJ Ya -> Ra Virama Ya
This seems to me to be over complicated for dealing with the situation; it
modifies the normal syntax for the ZWJ/ZWNJ characters, and is introduced
to deal with the consequences of the reordering process, which is in itself
actually not necessary.
It brings typists too close to the workings of the rendering engine, which
they should in general be protected from.
Typists have to remember a special case for a not-uncommon situation.
Paul's analysis of the situation is, I think, also inaccurate, in that he
regards the Yaphala as being, in effect, a grapheme, which it is not, in
Unicode, although it is, in effect, to the shaping engine, being regarded
as a post-base form, normally present in fonts as a glyph. Thus he seeks to
separate off the virama to allow it to interact unambiguously with the Ya
on the right. But Yaphala in a grapheme is a presentation form of Ya, not a
Ya which has been modified by a Virama.
The Virama which seems to be "attached" to the Yaphala in the grapheme
Ra_Yaphala has actually acted on the Ra to make it half; so whether the
half Ra looks like Ra or Reph is largely irrelevant to the real situation.
The Virama in the grapheme always operates on the preceding consonant to
make it half, it does not act on the consonant to the right to modify its
shape.
Alternative Proposal A conventional normative solution needs to be arrived
at, which is, of course, Paul Nelso's aim. Provided we do not reorder, the
most straightforward way of doing this is as follows:
* Ra Virama Ya -> Reph_Ya. This may or may not be a ligature glyph in practice.
* Ra Virama ZWJ Ya -> Ra_Yaphala, giving typists a consistent interface
to the system, by allowing them to generate a lesser-used form using the
normal character used for that purpose. While the use of this convention
would exclude the possibility of entering an explicit Reph, it is difficult
practically to conceive of a situation in which anyone would wish to do
this.
* Ra Virama ZWNJ Ya -> Ra_Virama_Ya, following the normal convention.
The default Unicode text shaping is not affected by these proposals, so
existing text is not broken by them.
These proposals are in accordance with the standard ordering for ZWJ/ZWNJ.
They do not follow the normal "control of conjunct formation" rules, so
require a note in the Bengali section.
They are easy for a typist to understand: an alternative but less common
form is generated by the use of the ZWJ in the normal order.
As they stand, they exclude the possibility of specifying a Ra_Yaphala
ligature or a Ra glyph followed by a Yaphala glyph, which would represent a
halfRa followed by Ya in the second grapheme form, and would in other
circumstances use the ZWJ formulation.
The Ra_Yaphala ligature could in practice be excluded from consideration,
on the basis that there is no historical or typographical justification
for it. Yaphala after Ra is only used in representing non-Sanskrit loan
words in Bengali, as far as I know, and as such is better represented by a
wiggly line Yaphala, in accordance with the general usage of this form in
loan-word situations.
But we could use some awful formulation such as Ra Virama ZWJ ZWJ Ya if we
really need to allow the distinction of Ra_Yaphala ligature forms from Ra
Yaphala forms.
Negative consequences of Not-Re-Ordering
The first and foremost consequence would be the breaking of all current
Unicode Bengali script processing engines.
The second would be the need to recreate open type fonts to apply the Reph
from the left. This would be a nuisance, but a one-time only nuisance in
each case
The third would be the need to deal differently with instances where
VowelSign Ii interacts with Reph. But in fact, Vowel Sign I is just as
likely to have problems with clashes with Reph, and it is roughly five
times more common.
Chandrabindu and Reph could get into clashes if they exist in the same
grapheme and are applied from opposite ends of the grapheme.
I am advised by Dr Ketaki Dysan that this only ever happens in the case
that Bengali script is used in the transliteration of French, and then not
commonly. Such rare cases could easily be resolved by constructing
ligatures for the specific graphemes, and including them in specialist
fonts.