Despite the literacy problems associated with traditional English orthography (T.O.), many linguists have sought to justify it as a highly optimal system for English word families. They advocate curricula based on this morphographemic concept. In order to quantify the morphographemic optimality of T.O., i.e., the degree to which word families retain the base spelling, a simple algorithm was applied to the derived and inflected forms of 100 bases. A relative optimality percentage was determined for each form, each family, and the corpus as a whole. Simultaneously, T.O., which was determined to be 95 percent optimal, was compared with a more phonemically reliable orthography, which was found to have a higher (97 percent) basic optimality. Finally, for purposes of determining the gradated difficulty of subject matter, the word families were ranked according to their optimality.

Introduction.

It is ... noteworthy but not too surprising that English orthography, despite its often-cited inconsistencies, comes remarkably close to being an optimal orthographic system for English. (Chomsky & Halle, 1968, p.49)

Problem.

How close is remarkably close? What would an optimal orthographic system for English look like? In order to answer these questions, especially as they relate to the teaching of English, consider what this influential aside from The Sound Pattern of English assumes. The authors presuppose at least a perceived problem with traditional English orthography (T.O.). Otherwise, Chomsky and Halle would not consider the noteworthy optimality of T.O. to be noteworthy. If T.O. were obviously optimal, it would not sometimes be called a serious "obstacle to literacy acquisition" (Carney, 1995, p.xvi). Studies about the difficulties for writer and reader abound. According to Carney,

Such a view has been [often] stated. Ever since English spelling settled down in the seventeenth and eighteenth centuries, the consensus seems to have been that the conventions we have inherited are ill-suited ... yet well-educated natives seem to cope with [T.O.], though only after a heavy investment of time and effort. (p.xviii)

Anecdotes of variability beg the question: just what is orthographic optimality? Chomsky and Halle state that an ideal orthography has one representation for each lexical entry (p.49). Others suggest that an optimal orthography uses one grapheme (i.e., letter) to signify one phoneme (i.e., a sound that distinguishes one word from another). The difference between these criteria reflects, to some extent, an emphasis on reading on one hand and a writing emphasis on the other. In short, definitions of optimal orthography differ, let alone how T.O. measures up.

Background literature.

A benchmark for the optimal spelling of English is available in Eastern Europe, where we find an active orthographic continuum. The Russian spelling system, for example, cannot be read "by a purely sequential, phonic method: it requires a combination of the phonic and look-and-say methods" (Knowles, 1988, p.9). This is the morphemic end of the spectrum. It retains the integrity of morphemes (i.e., meaningful, minimal linguistic units, namely words) at the expense of one-to-one, sound-to-spelling, spelling-to-sound correspondences. The other end of the spectrum, characterized by near-100% phonemic integrity, is represented by the Serbo-Croatian orthography. In Serbo-Croatian, phonemes reign supreme: there is no such concept as the integrity of the morpheme (Knowles, p.15). Between the Serbo-Croatian and Russian orthographies lies Byelorussian. Rather than maintaining morphemic integrity, this system partly overrides morphemes with assistance from a system that spells according to pronunciation. For instance, is pronounced /o/ until a stress shift renders a pronunciation of /a/; then the spelling also shifts to . Yet Byelorussian has adopted this principle only for vowels, not consonants. Knowles reports claims that this alphabetic system has helped improve literacy in Byelorussian (p.9). He concludes:

In the Slavonic languages a spectrum of spelling systems exists, from the predominantly morphophonemic (Russian) to the predominantly phonemic (Serbo-Croat); there is no representative of the English 'antisystem'! (p.16)

The optimality of this so-called English 'antisystem' can be systematically analyzed using theoretical assumptions underlying any point along this orthographic spectrum. Perhaps the best-known systematic analysis of any kind was performed by Hanna, Hanna, Hodes and Rudorf (1966). In order to determine how closely T.O. approximates the alphabetic principle, these Stanford University linguists incorporated a linguistically based research design into a computer program, thru which they fed 17,000 different words.

Their work, published as Phoneme Grapheme Correspondences as Cues to Spelling Improvement, began with the sound of the words as represented by phonetic respellings. Then, by devising rules, they attempted to spell those words correctly. To summarize, they found that 90 percent of the correspondences the program found between phonemes and graphemes were correct. However, fewer than 50 percent of the words they analyzed could be spelled correctly on the basis of phonological principles. Nevertheless, Carney states, while the 50 percent figure suffers from under- and overstatement, "this 50 percent success rate of correctly spelt words is probably too generous for the rules as they stand" (p.94). Despite 308 rules and 88 exception (i.e., set-aside) words, this analysis suggests that T.O. is 50% optimal on a phonemic sound-to-letter basis. Hanna, et al., admit that, when other phonological factors are not taken into consideration, T.O.'s phoneme-grapheme relationships only inconclusively approximate the alphabetic principle (p.39).

More recent research, with an eye toward speech synthesis, has emphasized the spelling-to-sound optimality of T.O. Ainsworth's algorithm (1973) stands out among those devised to account for English spelling with basic correspondence rules. Just as success for Hanna, et al, is correct spelling, success for Ainsworth's algorithm is the intelligibility of the synthesized speech output (Carney, p.260). Ainsworth has no set-aside table of irregular words and uses 159 correspondence rules - although a quarter of these rules have to do with single words or morphemes. While Carney cautions that such an algorithm cannot be quoted as an unqualified index of the optimality of T.O., Ainsworth's results are suggestive:

Listeners judged the comprehensibility of the synthetic speech output. The best results came from the more experienced listeners who were used to ... synthetic speech. The best of these identified 90 percent of synthesized words correctly; the poorest listeners could only manage 50 percent (Carney, p.266).

In other words, Ainsworth made 50 to 90 percent of words in a text identifiable using an algorithm of 159 correspondence rules. Therefore, in terms of one-to-one, spelling-to-sound correspondences, Ainsworth's results suggest an optimality of approximately 70 percent, with a practical margin of error of plus or minus 20 percent.

In terms of basic one-to-one correspondences, then, if one were to average the success rates and, thus, the phonemic optimality results of Hanna, et al, and Ainsworth, then the optimality average of 50 and 70 percent, or 60 percent, could be an approximation.

Both analyses are based on surface or self-evident phonemic principles. Beneath the surface, however, are morphophonemic patterns, which have been explored by researchers since the 1960s. Venezky (1967), who defines T.O. as a phonemically based system that maintains morphemic identity whenever possible, provides word pairs as evidence of these patterns: labor/ laborious, rigor/ rigorous, and curious/ curiosity - altho curiosity fails to maintain the morphemic identity of its base form (curious). McDonald (1970) suggests "it is more valuable to have an orthography which protects the obvious visual similarity in word families than one which obliterates such relationships in favor of broad phonetic accuracy" (p.325).

"Making efficient reading easier" is the target of widely cited morphophonemic pedagogist C. Chomsky (1970, p.292), who advocates the close correspondence of T.O. and underlying abstract forms rather than their phonetic realizations. While she may be faulted for not seeking to make all forms of reading easier, her word pair samples such as nation/national and courage/ courageous appear to make efficient reading easier by "permitting immediate direct identification of the lexical item, without requiring the reader to abstract away from irrelevant phonetic information" (p.289). Yet other orthographers counter that, tho these morphophonemic theories are valid on their face, a lack of reader cognitive awareness of these patterns may make the issue moot. Indeed, Chomsky expresses concern when she asks: "Does [this abstract lexical representation] have a psychological reality for language users, [i.e.,] is it based on something a reader can honestly be said to know?" (p.295). Her own reply - "it seems to me [that it does]" - is hardly persuasive, betraying a lack of available hard evidence in 1970.

Among the first to note specific flaws in morphographemic theory were Simon and Simon (1973), who argue that there are too few word pairs of this type to be useful and that such analogies will often lead to misspellings (e.g., remember-remembrerance; proceed-proceedure)" (cited in Marsh, Friedman, Welch & Desberg, 1980 p.353). Frith (1980) points out that, tho learners do use such analogies and rules when spelling novel words,

linguistic rules are complex and of a large and unknown number [and often] known by hindsight only. For instance, one could theoretically know how to spell nation (rather than nashen) because of the morphological relationship to native; on the other hand, one probably only knows of the relationship because one can spell nation. Moreover, relationships [often] give misleading cues. For instance, pronunciation might be spelled pronounciation as it relates to pronounce; spatial might be spelled spacial as it relates to space, anddeceit might be spelled deceipt as it relates to deception. (p.504)

Moreover, Baker (1980) tested the orthographic cognizance of students. He found that, in terms of derivationally related words, "the overall tendency is against preserving these particular visual relationships, suggesting little support for this function of English spelling" (p.58).

Even so, morphophonemic theory has informed much of the orthographic literature, and for good reason. Baker gives one such good reason, citing Jarvella and Snodgrass' (1974) demonstration that subjects find it easier to make judgments of meaning-relatedness when pairs of written words are barely different from one another, as in revise-revision, than when they are not, as in divide-division (p.53). Morphophonemic reasoning merits systematic analysis, but no one has yet attempted a quantifiable analysis in the manner of Ainsworth and Hanna, et al.

Purpose and Rationale.

The main reason for developing and applying an optimality algorithm is to quantify the degree to which T.O. retains the base spelling in word families. A second reason is to compare the optimality of T.O. to an external reference (in this case, is a simplified English spelling system called Sound-spel (S.S.)). The third is to provide teachers and curricula designers with a quick, logistical way to determine the gradated difficulty of word pairs as well as word families.

Regarding the first reason: though there has been little statistical proof to date of the morphophonemic optimality of T.O., the dearth of quantifiable analysis has not prevented provocative declarations like those of Chomsky and Halle. The following are representative:

There is no valid reason ... for claiming that the current orthography should be anything in particular other than what it is. (Venezky, p.122)

In short, advocates of T.O., basing their analysis on a great deal of observation and much anecdotal evidence, have made insightful claims, but unverified ones.

Regarding the second reason for analysis: systematically unsubstantiated claims regarding the morphemic costs of using a spelling system with greater phonemic reliability appear thru-out the literature. For instance:

It is clear that a broad phonetic orthography [such as proceed/ procejure] would be more difficult for native adult speakers to read. (McDonald, p.323)

[These phonetic variations] need not be represented in the lexical spelling of words, and indeed, underlying similarities which are real in the language would be lost ... if these differences were to be represented on the lexical level. (Chomsky, p.292)

It is not at all true that any kind of "regularized" English orthography (however inconsistent or rigorous it may be in application) is in any sense an improvement on what we already have ..." (McDonald, p.325)

Yet the rationale for a reliably phonemic orthography is simple. Cummings (1988) states that the phonemic is first among competing aspects of orthography. Among the aspects or demands made of orthography, "the first, the phonetic, urges that sounds be spelled regularly from word to word. This ... [stems] from the invention of alphabetic writing in ancient times" (p.461). According to Marsh, et al, "simple invariant and reversible spelling to sound correspondences [provide the learner with] an algorithm for decoding and encoding printed words" (p.351). Therefore, one could hope for an orthography that is optimally phonemic and also optimally morphemic.

Regarding the third reason: increased recognition of word forms and the ties that bind them together can only serve the process of reading for meaning. Toward this end, an awareness of which word pairs and families are simpler can aid teachers as they plan their literacy strategy. In spite of evidence to the contrary, Smith and Baker (1976) remind us that, given an appropriate level of content, even "linguistically unsophisticated [language learners] can squeeze a huge amount of information out of a word's spelling" (cited in Cummings, p.32).

Method.

Subject matter.The analyzed base word forms are taken from Basic Reading Vocabularies (Harris and Jacobson, 1982). The authors describe their work as a comprehensive professional reference, based on a computerized analysis of eight reading series. All but one were published after 1979. In particular, their

Frequency List provides the rank, the [base] word representing the [inflectionally] merged entry, and the frequency. Words with the same frequency [are] assigned the same rank. If, for example, five words are tied for rank 151, they are all given the rank of 151 and the next word is ranked 156. Within the tie, the words are listed alphabetically. (p.6)

While a representative sample could have included 1000 or 10,000 words, such a quantity goes well beyond the scope of this work. Even so, a representative word sample is necessary for the sake of reliability and general application. Dewey says that " ... in any short list of commonest words, short and Anglo Saxon words predominate. The result is that analytic data based on commonest words only will give, inevitably, a seriously distorted portrait of English as a whole" (cited in Fries, 1965, p.7). As a compromise between wide distribution and high frequency, 100 words were chosen, or every 25th word from number 25 (have) to number 2500 (caution).

The Harris and Jacobson corpus includes words without inflected or derived forms (such as me with a rank of 50). It was decided that in such cases, the word would be replaced by the next word (such as like with a rank of 51) with inflected or derived forms. As opposed to like, which serves as the basis of inflected (e.g., liking) and derived (e.g., liken) forms, me serves as the basis of neither. Thus, comparison is impossible and, so, me is moot and excluded.

Another type of excluded word is referred to by Chomsky and Halle, who disregard exceptional word pairs such as I/we because "given the grammar of English, if we delete reference to the item we, there is no way to predict the phonetic form of the plural of I" (pp.11-12). In the same manner that such word pairs are dismissed, for the purposes of this analysis it was decided that an exceptional word pair is any pair in which the inflected/derived form does not retain the same first letter in the base word. For instance, be/been is an acceptable subject for study, whereas be/were is unacceptable because were fails to retain a semblance of the base form. It was decided that in each such case of word families marked by an exceptional form, the base word (such as be with a rank of 25) would be replaced by the next base word (such as have with a rank of 26).

The word forms subject to analysis were the inflected and derived forms; excluded from analysis were inflected forms of derivatives. In other words, whiting and whiten, an inflected and a derived form of white, respectively, were included; whitens and whitener, an inflected and a derived form of whiten, respectively, were excluded. This parameter is due in part to the limited scope of this work and to Cummings' timely suggestion that "[more distantly related word forms] are less interesting to orthographers than sets" (p.46).

In each set, i.e., word family, the word forms are listed generally in alphabetical order and order of length. Cummings says the distinctions between inflection and derivation are problematic. Therefore, suffice it to say that if orthographers have trouble distinguishing between inflection and derivation, the man or woman on the street can hardly be expected to make the distinction. This analysis does not try.

For the sake of reference and comparison, each word and word form was respelled using Sound-spel, then listed beside the T.O. spellings in a parallel column. A second look at the words containing the /iy/ phoneme demonstrates that, for the purposes of this analysis, Sound-spel (S.S.) is more phonemically reliable: keen, kee, skee, deseet, feeld, peepl, teem, leev, raveen, beleev, cheez, leeg, debree. Carney refers to the summary logic of S.S. when he talks of its representation of the traditional English long vowel sounds with a so-called silent-e: "the moving forward of the marker for long vowels (biet, not bite) is [not] startling, since the digraph is familiar ... from open syllables such as lie, toe, and due" (p.478). See Appendix B for more details.

Optimality algorithm and procedure.

An algorithm was developed to determine the optimality of English spelling in terms of its morphophonemic, or rather, its morphographemic basis. As with the assumptions pertaining to a formula, the shift to the term morphographemic is based predominantly on the anecdotal strategies of Chomsky among others. Chomsky's morphophonemic logic betrays its dependence on graphic or visual appearance with her use of the word pair anxious/ anxiety. As opposed to pairs such as critic/ criticism and national/ national, anxious and anxiety share no readily apparent morphemes, altho Chomsky posits the sequence anxi as a shared underlying lexical spelling. The question is, why posit anxi rather than anx when neither is especially lexical nor morphemic? The circularity of her response that " ... this common item is recognized by the language user as a common item" (p.290) suggests that Chomsky posits anxi, which is not more common than anxi, because it consists of an additional grapheme. Chomsky's actual emphasis, then, is morphographemic girth or length; so is the emphasis of the algorithm.

Regarding the algorithm proper, Chomsky's nation/ national word pair is assumed to be 100% optimal. Any change is quantifiably for the worse. In lieu of more involved algorithms for determining a quantifiable optimality, it was decided that the percent of base word (letters) retained by a derived or inflected form is equal to the morphographemic optimality of that form. For instance, the spelling of the word national does not disturb the integrity of the six letters that constitute the morpheme nation. In order to obtain a specific percentage, the six morpheme letters of national are divided by the six morphemic letters of nation for a resulting figure of 1, or 100 percent. When a letter in the morpheme is changed, or when a letter is added into or subtracted from the morpheme, morphemic integrity is disturbed, which reduces optimality, as in the case of use and usable. In transition from use, usable disturbs (i.e., subtracts) one of the three letters, thus, the undisturbed two letters are divided by the three letters of the base for the resulting figure of .66, or 66%. Finally, given each form's optimality, an average for the word family was determined. An average was taken for the corpus as a whole.

Results.

Optimality of Traditional Orthography (T.O.)

The morphographemic optimality of T.O. was found to be 95 percent, which was determined by averaging the optimality percentages of 100 word families. An example of one such words family, drawn from Appendix A, appears in Table I.

Comparison with Sound-spel (S.S.).By comparison, the optimality of Sound-spel was found to be 97 percent. See Table II for an example of three such word families as they appear beside their traditionally spelled T.O. counterparts in Appendix A. In the first sample, T.O. is more optimal overall. In the second sample, T.O. and S.S. are equally optimal. In the third sample, S.S. is more optimal on the whole.

Table II.

Three sample comparisons of T.O. and S.S.

frq.225.

T.O.hearheardhearshearerhearing

optimal%-100100100100100

optimal%-7510010010094

S.S.heerherdheersheererheering

250.

highhigherhighlyhighesthighness

-100100100100100

-100100100100100

hiehieerhielyhieesthienes

275.

voicevoicedvoicesvoicingvoicelessinvoice

-1001008010010096

-100100100100100100

voisvoistvoisesvoisingvoislesinvois

Note: S.S. = Sound-spel, trade name of the orthography introduced by Rondthaler and Lias in Dictionary of Simplified American Spelling (1986).

Word family ranking.

For purposes of determining the gradated difficulty of subject matter, the traditionally spelled bases were ranked according to the average optimality of their related forms, in descending order from 100 percent. For the optimality rank of these word families, see Table III, which also breaks down optimality ties between families - such as the 53 families with 100 percent optimality - according to frequency.

Note: frq. = frequency rank; T.O. = traditional orthography; opt. = morphographemic optimality of word family associated with that base. Average = morphographemic optimality of the corpus as a whole.

Discussion.

Based as it is on the alphabetic principle, the nature of phoneme alteration, such as the change from receive to reception, suggests that many pairs (and, thus, families) can never reach a morphographemic optimum- unless much of the current phonemic correspondence of T.O. is reduced in favor of drastic word-sign oriented measures. For example, an overly zealous morphographemic spelling of reception relative to receive would be receiption or even receivtion. Be that as it may, morphographemic compromise is struck by Sound-spel, a spelling system that has greater phonemic reliability and was found to be better able to protect morphemes.

Appropriately enough, in his reply to an account of how his words have been used to justify the claim that T.O. is so optimal that it cannot be improved, Chomsky writes,

I'm surprised to learn that the work of Morris Halle and I did on English phonology is being used in [that way]... It has no such implications ... I cannot image that anyone doubts that ... we could easily design a spelling system for English that would be much easier for everyone to use... (personal communication, July 26, 1994)

Recommendations.

In order to better "enrich the pupil's vocabulary so as to enable him to construct... the [patterns of regularity based on word relationships]" (Chomsky, p.302), particularly in the stages of reading for meaning, the optimality of word pairs and word families should be considered during text selection and manipulation. In other words, since a word pair like courage/ courageous is more obviously related than guide/guidance, instructors should be advised to include and emphasize more obviously related pairs - all other things being equal.

Suggested research.

Orthographic value (and weight) is in the eye of the orthographer. In short, someone else may decide that specific generalities or facets of T.O. are valuable and worth factoring into a new algorithm. Among the facets that have suggested themselves during this analysis - and perhaps should be considered in designing more complex versions of this optimality algorithm - are the following:

* The ability to distinguish homophones without recourse to context: how valuable is it to the reader? Cummings regards a distinguishing of homophones thru orthographic means as an "advantage for readers [but] a disadvantage for spellers, in that it provides them with one more slight but important contrast to keep straight" (p.42). That being said, what are the quantifiable drawbacks to the speller?

* The predictability of phonemic alternations and rules. If the predictability of having a value of /s/ before is 75 percent, would such a morphographemic switch be counted as .25 change? What if its predictability is 95 percent? Does that mean the weightiness is .05 percent? Or is there a point short of 100 percent that such changes may be considered null and moot? Does this relativist theory then call into question the value and weightiness of other letter changes? Moreover, should predictable rules be somehow factored into an equation or formula? Cumming, for instance, posits a rule for deleting silent final-e, as follows: "with very few holdouts, a silent final-e that marks a long vowel that heads a Vce# string is deleted whenever a suffix is added that starts with a vowel" (p.155). Given Carney's reminder that a spelling rule should be "easy to state and understand" (p.76), what weight if any should be given such a rule?

* The length of each word in a word pair or word family given the benefits of shorter content length. What if greater letter quantity in a spelling means that a letter disturbance between base and form is less distracting and thus the family tie is more evident? What are the tradeoffs between short and long spellings of particular base words?

* By the same token, while this analysis has emphasized the reading aspect of literacy, a writing emphasis might give rise to different values and weights. For example, if a pupil is encoding rather than decoding, the interruption of a short morphographeme (such as lay/laid) may be more memorable and more optimal than the interruption of a longer morphographeme (such as exclaim/exclamation). Thus, the incentives for short spellings in cases where the morphographeme is interrupted would dovetail with the aforementioned incentives for shorter spellings overall.

* The number of letters into a word in which a disturbance occurs: if a grapheme is changed, added, or subtracted in the second position of the word, is that somehow weightier than a disturbance which occurs in the final position of the word?

* The spelling system of other languages: how does spelling elsewhere compare with T.O. in terms of morphographemic rates of optimality?

Summary.

Despite the literacy problems associated with traditional orthography (T.O.), linguists have sought to justify T.O. as a near optimal system for English word pairs and families. In order to quantify the morphographemic optimality of T.O., a simple algorithm was applied to the inflected and derived forms of 100 words. An optimality percentage was determined for each form, each family, and the corpus as a whole. At the same time, T.O., which was determined to be 95 percent optimal, was compared with a more phonemically reliable spelling system called Sound-spel, which was found to have an optimality rating of 97 percent. Finally, in order to determine the gradated difficulty of the sample families, the base words - representing the optimality of their extended families - were ranked in descending order.