Given a string of text with a foreign word, we need to (a) identify the language of the foreign word, and (b) explain how to render the word (either in the foreign language or in the local language.

“Language switching” can occur in the middle of a sentence. There is a “dominate” language and a “code switched language” language. The code switch language may be multiple token, a single token, or (as in the case of MP3) part of a token.

A more interesting example: I said ##### to him. (where #### is a string of Japanese characters). We need to use xml:lang = “ja” for the #### and xml:lang = “en” for the English part.

We need a language output identifier (which may be different from Japanese)

You come across (1) an entirely different language, (2) a language that you might no, or (2) a language that the TTS can attempt to pronounce.

If the target language appears only in a limited way, we can place it into the lexicon. Otherwise we may need:

Matrix language target language

Script script

Render render

Just insert a « another language » mark.

Summary: This is a hard problem. We want to separate scripting and rendering, but it’s not clear that this is necessary. We have a lot of things to work out.