('binary' encoding is not supported, stored as-is)
Why not just give a Unicodepoint to every conceivable combination of letter and accent mark? It seems like sooner or later they'll all get used. Once I saw LATIN SMALL LETTER N WITH UMLAUT (did I say it right?) in a graffito.
I'm being serious here, because of the problems with Arabic transliteration and Lithuanian.
Have we enough codepoints left? How many codepoints have we?
65534? 16777214? 4294967294?
My mnemonic for 2^32: "Lady of eloquence, your sensitive verses inspire my misguided effort."