Mark Davis wrote:
<quote>
2. Specialized BIDI. Force a consistent order on URLs, using a
higher-level protocol on top of the UBA.
. . .
B. There are actually two variants of this:
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.
(a) is a simpler approach technically, since the generated plaintext can
have single direction associated with the label separators. It can be
implemented in display and cut/paste by having LRMs around each label that
contains a RTL character or no LTR characters.
</quote>
I think that adding LRMs around RTL labels would not be enough, if the
context is RTL. Assume the following URL:
http://12-34.ABC.DEF.567
and let us represent LRM by @.
Mark's variant (a) results in adding LRMs as follows:
http://@12-34@.@ABC@.@DEF@.@567@
In a RTL context, this will be displayed as:
@567@.@FED@.@CBAhttp://@12-34@.@
In order to get the consistent LTR display order, we need to add LRE/PDF
around the URL as follows (where [ represents LRE and ^ represents PDF):
[http://@12-34@.@ABC@.@DEF@.@567@^
which will be displayed as follows, independently of the context:
[http://@12-34@.@CBA@.@FED@.@567@^
We can see that this can be simplified by only having LRM before (not
around) RTL labels, and only if they follow an RTL label, and before
labels containing no LTR characters only if they follow a RTL label, as
follows:
[http://12-34.ABC.@DEF.@567^
This is not overly complex to do. I know, I have written code for it.
Shalom (Regards), Mati
Bidi Architect
Globalization Center Of Competency - Bidirectional Scripts
IBM Israel
Phone: +972 2 5888802 Fax: +972 2 5870333 Mobile: +972 52
2554160
From:
Mark Davis ☕ <mark@macchiato.com>
To:
Shawn Steele <Shawn.Steele@microsoft.com>, Adil Allawi <adil@diwan.com>,
"public-iri@w3.org" <public-iri@w3.org>, "bidi@unicode.org"
<bidi@unicode.org>, Murray Sargent <murrays@exchange.microsoft.com>,
"aharon@google.com" <aharon@google.com>, Nasser Kettani
<Nasser.Kettani@microsoft.com>
Date:
28/05/2010 01:25
Subject:
[bidi] Re: Special ordering for BIDI URLs
Sent by:
bidi-bounce@unicode.org
A few comments on various issues.
1. Market Forces. Make it possible for URLs (actually IRIs) to be
completely RTL
A. Shawn raised the issue of .html. As I think about it, there are a
couple of ways to deal with this. First, even currently servers don't need
to use those suffixes: http://unicode.org/reports/ doesn't contain a
.html. Secondly, we could establish equivalences for some Hebrew and
Arabic-script suffixes to take the place of those.
2. Specialized BIDI. Force a consistent order on URLs, using a
higher-level protocol on top of the UBA.
A. The proponents of specialized reordering really need to come up with a
good story for how to deal with the security and interoperability issues
presented by plaintext applications and non-new-URL-ordering applications.
B. There are actually two variants of this:
a. have the consistent order be LTR.
b. have the consistent order be the paragraph direction.
(a) is a simpler approach technically, since the generated plaintext can
have single direction associated with the label separators. It can be
implemented in display and cut/paste by having LRMs around each label that
contains a RTL character or no LTR characters.
While for users this may not be quite as natural, the most important
feature is having a predictable ordering (the ordering of labels in URLs
is already somewhat screwy, since the domain name is Little-Endian, and
the rest is Big-Endian).
3. New Characters (Adil's proposal).
While an interesting proposal, the problems would be:
introducing security risks with the new characters.
a significant change to the UBA - and even extremely minor changes have
caused enough problems that the UTC has grown quite leery of rocking the
boat.
it takes at least a couple of years to get characters accepted by both
Unicode and ISO.
none of the old URL-aware software would handle the new URLs (a problem
also for the LRM approach).
Mark