Wow.
Optimality Theory applied to orthography.
(I had that in the back of my mind, but that is because OT is hard-wired
into my brain. I never actually saw an example.)
A violation of *DIGRAPHS implies one of 1P1G (as I understand them); you
might want to have *MANY-(PHONEMES-)TO-ONE(-GRAPHEME) and *ONE-TO-MANY
instead.
*UNICODE I'd replace with a partial order on the available graphemes (in
OT terms: a harmonic scale), or several constraints for grapheme
families, such as (taking your example): e > ɛ > é > è > ë > ae (part of
that order follows from *DIACRITICS and *ONE-TO-MANY)
I am unsure about FAITH-PHON. It just sounds too vague. Maybe it should
mark impossible (uninterpretable) phoneme-grapheme combinations.
However, this sketch doesn't cover much of natlang orthographies with
Latin letters. I tried Italian ci/chi/ca/cia (with local ambiguity of
"c"), with lots of ideas, but no conclusive results so far.
On 14.01.2014 03:54, David Peterson wrote:
> Honestly, when it comes to romanization, I think the whole thing can best be conceptualized with an Optimality Theory tableau, where your constraints are things like FAITH-PHON (be maximally faithful to the phonology of the language), *DIACRITICS (avoid diacritics), *DIGRAPHS (avoid digraphs), 1P1G (1 phoneme = 1 glyph), *UNICODE (avoid non-ASCII characters), etc. An online program that had all these constraints in them would be best, then you could arrange them as you like and give it candidates, and it'll tell you what the optimally romanized form would be.
>
> Like, taking e~ɛ, let's take the latter. Here are some candidates for ɛ:
>
> ɛ
> e
> é
> è
> ë
> ae
>
> If you start ranking these constraints based on whatever preferences you have, you'll see how some of these will be eliminated over others. For example, if *DIGRAPHS is ranked highly, "ae" will lose out early. If *DIACRITICS is ranked highly, though, "e", "ɛ" and "ae" will pull through over the others. If you allow for diacritics but take recognizability as a highly ranked constraint, I think both "e" and "è" would clearly win out over "ë" for [ɛ]. Unfortunately for that constraint to work we'd need a pretty thorough survey of natural language romanization systems, but I'm pretty sure "è" is always going to be a more optimal candidate for [ɛ] than "ë" unless something is blocking it (e.g. a tone language that marks low tones).
>