PHONEMES is a program which converts
the orthographic representation (the "spelling") of the input to ModelTalker
into its phonemic representation (its "dictionary pronunciation"). For instance,
PHONEMES converts the word "name" into NN EI MM, symbols which represent the
three phonemes in the word "name". PHONEMES does not use an actual computer
dictionary in order to perform its task. Instead, it goes through a complex
series of context-sensitve rules (of the sort routinely used by generative phonologists)
which instruct the program how to convert the letter sequence into the correct
sequence of phonemes. Since the spelling of English words is quite complex and
irregular, PHONEMES requires a few thousand rules to accomplish this task successfully.
These rules are divided into eleven sets or "levels", which PHONEMES executes
in a prescribed sequence. The first 7 levels perform a thorough morphological
analysis of the input word. Of these, the first three levels detect the presence
of suffixes; the next three determine whether the word is a compound; and the
last level looks for common prefixes. The next two levels perform the actual
letter-to-phoneme conversion, resulting in a preliminary phonemic representation.
The last two levels convert this initial representation into a more detailed
phonetic representation, dividing the phonemes into syllables, calculating the
stress pattern of the word, applying allophonic rules (determining, for instance,
which occurrences of /t/, /p/, and /k/ should be aspirated), and reducing unstressed
vowels to schwas. Each level has an "exceptions dictionary" which allows exceptions
to the general patterns to be fixed. In the past year, we have been working
intensively to improve the rule sets so that ModelTalker pronounces more words
correctly. This is being accomplished by running PHONEMES on the contents of
a massive online dictionary (with more than 100,000 words), and comparing the
output of PHONEMES with the correct pronunciations of these words (as recorded
in the dictionary). We examine the output for patterns in the errors that PHONEMES
made, and these patterns form the basis of new rules added to PHONEMES's system.
In the last year, we have fixed the transcription by PHONEMES of almost 10,000
words with this process, and improved the transcription of thousands more. You
might be wondering: instead of using such a complex program, why not simply
look up each word in a dictionary in the computer's memory? First of all, our
program takes up less computer memory than a dictionary. The program needs to
store only a few thousand rules, compared to the hundreds of thousands of words
recorded in a dictionary. A second reason is that the English language is constantly
growing. New words are added to the language every year; English has the capacity
for an infinite number of words. No dictionary could possibly hold all these
words, and a program that depends on a computerized dictionary will fail badly
when given a new word. In contrast, our program has the capacity to take a reasonable
guess at the pronunciation of a word it has never encountered before.