December 06, 2004

Lexemes and word forms

Language Log readers who are sharp of eye and typographically on the
ball — the sort of readers who can tell one font from another, and
thus tend to refer to Dan Rather's embarrassing Microsoft Word-processed
Texas Air National Guard memos as "forged" rather than "of disputed
authenticity" &mdash will have noticed that I sometimes cite words that I
mention in a post by putting them in italics (like this), but then
sometimes I put them in bold italics (like this). I should
have explained this notational convention long ago. I did actually touch
on it accidentally in another context once (in
this post), but you could be forgiven for having
overlooked it, since that post was primarily about trademark law.
But anyway, my usage does not display random variation in font style
selection. There is a semantics to it. The needed explanation follows.

. . . when you hear crown you have
your crucial piece of evidence. The preterite of crown is
crowned, so the line And crown thy good with brotherhood
cannot be a preterite.

Why the first occurrence of "crown" in italics, the second in bold
italics, and then "crowned" in italics again? The answer is that the
font style distinction is systematically used to reflect a conceptual
distinction: word forms are being distinguished from lexemes.

A lexeme is a word in roughly the sense that would correspond to a
dictionary entry. Lexeme names are given in bold italics. The point
about "crown", for example, is that as a transitive verb it would get one
entry despite the existence of four different shapes in which it appears:
crown, crowns, crowned, crowning. These
different shapes spell out word forms that belong to the verb lexeme
crown. In a big and detailed dictionary they would all be
listed in the single entry for crown. (In shorter
dictionaries you would just be expected to know that the word forms for a
regular verb like crown would be crown,
crowns, crowned, crowning, the word forms for a
regular verb like walk would be walk, walks,
walked, walking, and so on: they list the lexemes, you are
meant to know the grammar.

There would be another lexeme in the dictionary for "crown", of
course: a noun lexeme crown. Its word forms would be the
plain singular crown, the plain plural crowns, the genitive
singular crown's, and the genitive plural crowns'.

How could "one of the few points on which the sages of writing agree"
possibly be that "it is good to avoid them" when to utter the very thought
you need the adjective good? How could William Zinsser possibly be
serious in saying that most adjectives are "unnecessary" when he couldn't
finish his sentence without the adjective unnecessary?

Here I actually mean the adjective lexeme good, which has
the word forms good, better, and best. Using
better would count as using the adjective good,
though in its comparative inflectional form. But I the next adjective
mentioned was unnecessary, which does not inflect for comparison:
there is no *unnecessarier or *unnecessariest. I thought
it would look distractingly odd to put just good in bold
italics. I therefore didn't add that pedantic detail. Nothing about
its inflectional forms was relevant to what I was saying. But in general,
whenever there could be confusion about whether I meant a word form or a
lexeme, I will use the distinction in font styles, and always in the same
way.

For words that have only one shape the distinction between lexemes and
word forms makes no sense (for a language that truly has no inflection at
all, one wouldn't draw the distinction), so the minimum number of word
forms for a lexeme would be two. That minimum is represented in English
by verbs such as must and ought, which are
modal verb with no preterite (inflected past tense). The shapes of the
two word forms of must are must (present tense
neutral) and mustn't (present tense negative).

Which English lexeme holds the record for most word forms? The
answer is be. The absolute minimum number of
separate word forms it has (assuming no distinct word forms that have the
same shape, but counting the informal-style negative variants as word
forms) is 12: am, are, aren't, been,
be, being, is, isn't, was,
wasn't, were, weren't.

In some languages (Sanskrit, for example) the number of word forms
for a verb lexeme is in the high hundreds, and for some others (Turkish,
for example) it is certainly in the thousands.