A word is a unit of language that carries meaning and consists of one or more morphemes which are linked more or less tightly together. Typically a word will consist of a root or stem and zero or more affixes. Words can be combined to create phrases, clauses and sentences. A word consisting of two or more stems joined together is called a compound.

Latin written without any word breaks in the Codex Claromontanus

Contents

Difficulty in defining the term

The precise definition of what a word is depends on which language the definition is for, and the dividing line between words and phrases is not always clear. In most writing systems, a word is usually marked out in the text by interword separation such as spaces or word dividers used in some languages such as Amharic. In other languages such as Chinese and Japanese, and in many ancient languages such as Sanskrit, word boundaries are not shown.

Even in writing systems that use interword separation, word boundaries are not always clear; for example, even though ice cream is written like two words, it is a single compound because it cannot be separated by another morpheme or rephrased like iced cream or cream of ice. Likewise, a proper noun is a word, however long it is. A space may not be even the main morpheme boundary in a word; the word New Yorker is a compound of New York and -er, not of New and Yorker. In English, many common words have historically progressed from being written as two separate words (e.g. to day) to hyphenated (to-day) to a single word (today), a process which is still ongoing, as in the common spelling of all right as alright.

Words in different classes of languages

In synthetic languages, a single word stem (for example, love) may have a number of different forms (for example, loves, loving, and loved). However, these are not usually considered to be different words, but different forms of the same word. In these languages, words may be considered to be constructed from a number of morphemes (such as love and -s).

In polysynthetic languages, the number of morphemes per word can become so large that the word performs the same grammatical role as a phrase or clause in less synthetic languages (for example, in Yupik, angyaghllangyugtuq means "he wants to acquire a big boat"). These large-construction words are still single words, because they contain only one content word; the other morphemes are grammatical bound morphemes, which cannot stand alone.

Matters seem easier for analytic languages. For these languages, a word usually consists of only a root morpheme, which is often single-syllable. However, it is common even in those languages to combine roots into a compound stem.

Complexity of word boundaries in speech

In spoken language, the distinction of individual words is even more complex: short words are often run together, and long words are often broken up. Spoken French has some of the features of a polysynthetic language: je ne le sais pas ("I do not know it") tends towards /ʒənələsepa/. As the majority of the world's languages are not written, the scientific determination of word boundaries becomes important.

Determining word boundaries

There are five ways to determine where the word boundaries of spoken language should be placed:

Potential pause

A speaker is told to repeat a given sentence slowly, allowing for pauses. The speaker will tend to insert pauses at the word boundaries. However, this method is not foolproof: the speaker could easily break up polysyllabic words.

Indivisibility

A speaker is told to say a sentence out loud, and then is told to say the sentence again with extra words added to it. Thus, I have lived in this village for ten years might become I and my family have lived in this little village for about ten or so years. These extra words will tend to be added in the word boundaries of the original sentence. However, some languages have infixes, which are put inside a word. Similarly, some have separable affixes; in the German sentence "Ich komme gut zu Hause an," the verb ankommen is separated.

Minimal free forms

This concept was proposed by Leonard Bloomfield. Words are thought of as the smallest meaningful unit of speech that can stand by themselves. This correlates phonemes (units of sound) to lexemes (units of meaning). However, some written words are not minimal free forms, as they make no sense by themselves (for example, the and of).

Phonetic boundaries

Some languages have particular rules of pronunciation that make it easy to spot where a word boundary should be. For example, in a language that regularly stresses the last syllable of a word (like Hebrew), a word boundary is likely to fall after each stressed syllable. Another example can be seen in a language that has vowel harmony (like Turkish): the vowels within a given word share the same quality, so a word boundary is likely to occur whenever the vowel quality changes. However, not all languages have such convenient phonetic rules, and even those that do present the occasional exceptions.

Semantic units

Much like the abovementioned minimal free forms, this method breaks down a sentence into its smallest semantic units. However, language often contains words that have little semantic value (and often play a more grammatical role), or semantic units that are compound words.

In practice, linguists apply a mixture of all these methods to determine the word boundaries of any given sentence. Even with the careful application of these methods, the exact definition of a word is often still elusive.