Eskimo Words for Snow

As I was reading about global warming and its effect on the Arctic and the people who live there, I couldn’t help bumping into some words in West Greenlandic. This is the main Inuit language spoken in Greenland. The people who actually speak it call it ‘Kalaallisut’. In June 2009 it was made the official language of the Greenlandic autonomous territory.

For example, I read about Sermersuaq. This is the Northern Hemisphere’s widest ‘tidewater glacier’: one that begins on land but terminates in water. It stretches 90 kilometers across!

This glacier is also called the Humboldt Glacier, but with all due respect to Humboldt, I’d rather call this magnificent, intensely forbidding realm by the name used by the people who can manage to live there! And so, I’d rather use the Kalaallisut word: Sermersuaq.

Anyway, after seeing words like Semersuaq, Kangerdlugssuaq, and so on, I started wondering about a famous urban legend.

You’ve probably heard that the Eskimos have lots of words for snow. And maybe you’ve heard other people say “no, that’s not true”.

But the whole dispute starts seeming rather silly when you find out that the Eskimo — or more precisely, the speakers of Kalaallisut — have a word for “once again they tried to build a giant radio station, but it was apparently only on the drawing board.” It’s

It’s about Inuktitut, which is the collective name for a group of Inuit languages spoken in Eastern Canada. You can see them on the map: they’re called Qikiqtaaluk nigiani, Nunavimmiutitut, and Nunatsiavummiutut.

This language is very different from English: it’s polysynthetic, meaning that words can be composed of many pieces.

For example, verbs can be singular, dual, or plural:

takujunga — I see
takujuguk — we two see
takujugut — we several see

But instead of using words like “because”, “if” or “whether”, they use different suffixes:

takugama — because I see
takugunnuk — if we two see
takungmangaatta — whether we several see

The object of the verb can be attached as a suffix:

takujagit — I see you
takujara — I see him
takugakku — because I see him

There are also suffixes that turn verbs to nouns, and suffixes that turn nouns to verbs… and you can even use both in a single complicated word!

There are also ways to indicate whether something is stationary or moving, expected or unexpected:

tavva! — Here it is! (stationary and expected)
avva! — There it is over there! (mobile and unexpected)

There are ways to add spatial information:

tavvani — at this (expected) spot
maangat — from this (unexpected) area
tappaunga — to that (expected) area up there
kanuuna — through that (unexpected) spot down there

And all this is just scratching the surface! Words can easily become huge—and in in a typical written text, only a minority of words are ever repeated.

Whether despite this or because of it, I think we must admit that the Inuit do have a marvelously terse way of describing lots of concepts related to snow and ice. For example, here’s a word list taken from Fortescue’s text on Kalaallisut:

I’m not sure how much people should worry about endangered languages: it’s fairly far down my list of worries, given how quickly (geologically speaking) people make up new languages. There are a lot of good reasons for poor people to learn big successful languages and, over a few generations, abandon their own. I can’t blame anyone for that! But there certainly is interesting information embedded in languages which gets lost when they disappear, especially if they haven’t been properly recorded.

There are about 50,000 speakers of Inuit languages in Greenland, 35,000 in Canada, and 7,000 in Alaska. The fact that Nunavut became a separate territory of Canada can’t hurt. Its area is about that of Western Europe! And I’m not just saying that because the Mercator projection exaggerates its size.

However, only about 33,000 people live there!

There’s also Nunavik, a large region of Quebec that may become its own self-governing territory. It’s as big as California!

Also, as I mentioned, Kalaallisut (or ‘West Greenlandic’) has become the sole official language of Greenland! According to Wikipedia:

This has made Greenlandic a unique example of an indigenous language of the Americas that serves exclusively as an official language of a semi-independent country, yet it is still considered to be in a “vulnerable” state by the UNESCO Red Book of Language Endangerment. The country has a 100% literacy rate.

I think it’s cool that an “indigenous language of the Americas” has become the official language of what I’d normally call a European country! Those Inuits sure get around.

I know that people have measured diversity of vocabulary. There’s a 1944 book that’s occasionally cited in the literature on biodiversity measurement, called The Statistical Study of Literary Vocabulary, by a statistician by the splendid name of Udny Yule. But I’ve never laid my hands on a copy.

In my post here on diversity, Jane Shevtsov mentioned that things get more interesting when a dynamical element is introduced, so that we’re not just studying the diversity of a static population. Of course, language is always drifting around. Moreover, it’s a hopeless task to look for a general rule saying when two languages are “the same” or “different”; there will always be borderline cases.

There are similar issues in epidemiology. Some pathogens evolve fast (many orders of magnitude faster than, say, mammals), and the dominant strains drift around “genetic space” in the same way that dialects drift around “linguistic space”.

Funny, this is twice in one day that I have run into a reference to Udny Yule, after never having heard of him before. I was reading a fun paper on power laws (http://arxiv.org/abs/cond-mat/0412004), and now this!

I think what people really worry about with the loss of languages is the disappearance of environmental knowledge (ie medicinal plants) and culture that goes with it. Dan Everett and Nick Evans have a lot to say about this. Another worry is whether cultural products on the level of the Iliad and the Odyssey are being tossed en masse onto the scrapheap – these are generally acknowledged to the first peaks of European culture before there was Europe, rarely equaled and never convincingly surpassed in the following 2700 years or so of putative civilization, and were produced by a culture perhaps not so very different from many of the ones now being annihilated. How do we know that there isn’t lots of stuff of comparable quality out there being destroyed? We don’t.

I was in a bit of a hurry when I wrote my last comment, so maybe it was a bit unfocused. Now that I found some more time, I dug up a more focused article on the “myth” of many Eskimo (is Eskimo considered politically incorrect?) words for snow and its relation to the Sapir-Whorf hypothesis.

Thanks for the references, Todd! I’ve seen some of these discussions, but far from all, so I’ll check these out. The part that hadn’t really sunk in with me before is that the very concept of ‘word’ in Inuit languages is problematic—or at least should be, in my opinion.

… is Eskimo concerned politically incorrect?

It’s not in Alaska, where it’s a useful word, since it includes speakers of both Yupik and Inuit languages. But ‘Eskimo’ has fallen out of favor in Canada and to some extent in Greenland. In central Canada the people call themselves Inuinnaq, in eastern Canada they call themselves Inuit, and in Greenland they call themselves Greenlanders or, in their own language, Kalaallit.

Of course the main reason I’m avoiding the word ‘Eskimo’ here has nothing to do with political correctness. It’s that I’m trying to learn a tiny bit about languages, and Inuit and Yupik languages are somewhat different, though related, so I don’t want to make false generalizations.

Here’s what Wikipedia says about the word ‘Eskimo’:

The primary reason that Eskimo is considered derogatory is the arguable, but widespread perception that in Algonkian languages it means “eaters of raw meat.” One Cree speaker suggested the original word that became corrupted to Eskimo might indeed have been askamiciw (which means “he eats it raw”), and the Inuit are referred to in some Cree texts as askipiw (which means “eats something raw.”) The majority of academic linguists do not agree. Nevertheless, it is commonly felt in Canada and Greenland that the term Eskimo is pejorative.

The focus on ‘politically correct’ names for people often seems to come after they’ve been pushed around in some way, and the Inuit in Canada were certainly pushed around. Just yesterday I learned about disc numbers:

Disc numbers or ujamiit in the Inuit language were used by the Government of Canada in lieu of surnames for the Inuit and were similar to dog-tags. The discs were small, made of leather, had a string attached and were supposed to be worn around the neck.

Prior to the arrival of European customs, the Inuit had no need of family names and children were given names by the elders. However, by the 1940s the record keeping requirements of outside entities such as the missions, traders and the government brought about change. In response to the government’s needs they decided on the disc number system.

The discs were stamped with “Eskimo Identification Canada” around the edge and the crown in the middle. Just below the crown was the number. The number was broken down into several parts, “E” for Inuit living east of Gjoa Haven and “W” for those in the west. This would be followed by a one or two digit number that indicated the area the person was from. The last set of numbers would identify the individual. The discs were used in the Northwest Territories (which, at the time, included present-day Nunavut) from 1941 until 1978.

Thus a young woman who was known to her relatives as “Lutaaq, Pilitaq, Palluq, or Inusiq” and had been baptised as “Annie” was under this system to become Annie E7-121.

I find this pretty obnoxious: invading someone’s land and forcing them to wear fucking collars with numbers on them so you can slot them into your grand economic system! Only in 1968 did the first Inuk elected to the Northwest Territories Council declare that he would not be known by his disc number. And as the article says, the system lasted until 1978! Since the discs said ‘Eskimo’ on them—a word the people never used themselves—obviously they’re going to be touchy about this word.

How is it determined that “nalunaarasuartaatilioqatee- raliorfinnialikkersaatiginialikkersaatilillaranatagoorunarsuarooq” is a single word?

That’s a great question. It occurred to me too, but I don’t know the answer yet. It would be really pathetic if the answer was “because the Europeans who introduced writing to the Inuit didn’t write spaces between the morphemes.”

I don’t think the answer is quite that stupid. I suspect it’s something more like this: “in Inuit languages, there’s no sharp distinction between words and sentences.” But I’m not sure exactly how it works.

In English we have a pretty good agreement on when someone has said a whole “sentence”. What about in Inuit languages?

The failure of Zipf’s law in Inuit languages convinces me that what people are calling Inuit “words” aren’t really “words” in the sense we normally mean. Wikipedia talks about the ability to keep on adding more and more suffixes, and says:

This sort of word construction is pervasive in Inuit language and makes it very unlike English. In one large Canadian corpus – the Nunavut Hansard – 92% of all words appear only once, in contrast to a small percentage in most English corpora of similar size. This makes the application of Zipf’s law quite difficult in the Inuit language. Furthermore, the notion of a part of speech can be somewhat complicated in the Inuit language. Fully inflected verbs can be interpreted as nouns. The word ilisaijuq can be interpreted as a fully inflected verb: “he studies”, but can also be interpreted as a noun: “student”. That said, the meaning is probably obvious to a fluent speaker, when put in context.

Wikipedia gives the example of a central Nunavut Inuktitut “word”:

tusaatsiarunnanngittualuujunga

meaning “I can’t hear very well”. Now I would tend to call that a sentence, but they say:

This long word is composed of a root word tusaa ‘to hear’ followed by five suffixes:

The answer is that they asked the people who spoke the language whether it was a word or not, and they told them that the affixes were not words.

The linguistic reason they called it a word is because Inuit languages (like Finnish, Tamil, and Japanese) are agglutinative languages. This means that words are built up by adding affixes. Affixes are morphemes, but not words. As an example in English, think of affixes like “co-“.”Co-variant” and “co-dependent” are words, but “co-” on its own isn’t a word, even though it has a meaning (namely, “together”).

In contrast, modern English is mostly analytic — most morphemes are words in their own right. (So it shouldn’t be surprising that the affix I gave is a Latinate root used mostly in technical discourse.)

Languages like German and Spanish stand between these two poles. They are inflected languages, where there are a lot of affixes, but the affixes are subject to fusion rules in ways that block unbounded expansion of words. Agglutinative languages are generally more regular, and tend not to have many affix fusion rules, since otherwise agglutination would not be possible.

The answer is that they asked the people who spoke the language whether it was a word or not, and they told them that the affixes were not words.

Thanks! But I’m still confused. Did the Inuit have a concept of ‘word’, and were the linguists confident that this concept sufficiently matched the Indo-European concept of ‘word’ that they could just stroll into town and ask people “is junga a word?” and get a useful answer?

That would be a rather dangerous approach. What if the Inuit word for ‘word’ actually means something more like ‘word or sentence’? What if the Inuit don’t draw a sharp distinction between words and sentences?

I hope the linguists had a way to tell which utterances count as ‘words’—some way that didn’t rely on their native informants having a formalized concept of ‘word’ that matches ours. But I’m not sure what this way would be.

Obviously some sort of affixes would rarely be uttered all by themselves, just as nobody here is going to walk into the room, say

co-

and walk out. But that doesn’t prove the affixes aren’t words; after all, nobody is going to walk into the room, say

of

and walk out, either!

I’m having trouble thinking of a good criterion that distinguishes between words and sentences in a culture without writing, and which will come to the conclusion that something meaning “once again they tried to build a giant radio station, but it was apparently only on the drawing board” is a word.

Maybe it’s that they say it without any noticeable pauses between the morphemes? Maybe it’s regarded as wrong to include a pause between the morphemes?

I guess I’m trying to say, in a roundabout way, that I don’t really know what a word is. I know what a word is in all the languages I’ve learned, but I don’t know a general definition of ‘word’ that would let me walk into an arbitrary culture and decide what are their words, and confidently proclaim that for the Inuit, nalunaarasuartaatilioqateeraliorfinnialikkersaatiginialikkersaatilillaranatagoorunarsuarooq is a single word. What’s to stop me from saying that all those affixes are words that can only appear after a certain initial ‘head’ word?

(By the way, speakers of Inuit languages would really hate the skinny-column format of this blog!)

My experience is that linguists are pretty clever at coming up with simple diagnostic tests for the claims they make so I’d be surprised if there isn’t a test for wordness. Unfortunately I’ve read a few (elementary) linguistics books and they take wordness for granted from page one. It’s always annoyed me :-)

Linguists define morphemes to be the smallest utterances having meaning. Words are the smallest utterances that it is grammatical to utter in isolation. You can establish grammaticality by asking native speakers whether an utterance is grammatical or not. This definition is independent of language family — none of the example agglutinative languages I gave in my previous post are in the same language family (and none of them are Indo-European).

As an aside, saying “of” or “from” (in Latin, anyway) was something that sufficiently trollish people in the 3rd and 4th centuries CE actually did! Whether Christ was “of God” or “from God” was a burning theological issue, and people could be moved to murder over the issue.

As an aside, saying “of” or “from” (in Latin, anyway) was something that sufficiently trollish people in the 3rd and 4th centuries CE actually did! Whether Christ was “of God” or “from God” was a burning theological issue, and people could be moved to murder over the issue.

In the 21st century people who are for example computing rocket orbits and say when it matters e.g. “half” instead of “fifth” may eventually also murder. I guess one could finaly also make quite a burning issue out of that.

To give you an example of an utterance where “of” by itself is grammatical, think of a learner of English who asks a native:

“In English, do you use ‘of’ or ‘from’ in this context?”.

A perfectly reasonable answer is “Of.”

Contrast that with:

“How do you make ‘cat’ plural in English?”

Bogus answer: “ssssss” (sound of s).

But you can say “Cats”, since it is a word.

Why is the bogus answer ‘bogus’? We can’t say it’s because ‘s’ is not a word, because that would be circular reasoning!

It could be bogus because very few people give that answer, or they look guilty when they do. I probably wouldn’t say “ssssss”, myself. I’d probably say “you add an ‘s’.”

How about this one:

“What’s the prefix in the word ‘cooperate’?”

“‘Co-‘.”

You could claim that this counts as a leading question, since the question is explicitly about ‘prefixes’, which ‘we all know’ (supposedly!) aren’t words. But again, it gets a bit circular when our criterion for what counts as a valid or invalid question in this game involves the linguist knowing ahead of time that in English, ‘prefixes’ don’t count as ‘words’.

Or, just as you gave this example:

“In English, do you use ‘of’ or ‘from’ in this context?”.

A perfectly reasonable answer is “Of.”

I could give this one:

“In English, do you use ‘in-’ or ‘im-’ to make the antonym of ‘possible’?”.

If this makes you worry that linguistics can get lost down a subjective rabbit-hole, then you are not alone. ;-)

It’s a problem in all science, and especially any science that studies people… I’ve known that for a long time, but I just recently had a moment of vertigo when I learned a bit about Inuit languages, and realized how subtle it can be to determine what counts as a “word”. Somehow I’d taken “words” for granted.

An interesting example of the slipperiness of the word concept comes from people teaching literacy to the the Warlpiris of central Australia – when they learn to write, they want to attach one-syllable suffixes directly to noun stems with no whitespace: pirli-ngki ‘rock-LOC’ “on the/a rock”, but two syllable ones with a space: pirli kirri ‘rock ALLATIVE’ “to the/a rock” The grammatical behavior of this is more less identical aside from their meanings, so that if there is an adjective both can show the case marker, or if they are next to each other, the last one can be the only one to have: pirli-ngka wita-ngka ‘on the small rock’; jangarnka-kurlu-rlu yuulyu-kurlu-rlu ‘beard-having-ERG luxuriant-having-ERG’ “people with long beards”; here the 2-syllable case-marker is the ‘proprietive’, attached to something that the referent of a noun phrase has (so ‘with’ in ‘a man with a telescope is following us’ (Simpson (1991) _Warlpiri Morpho-Syntax). The ending glossed ‘ERG’ is the ‘ergative’ case-marker that signals the Agent-like participant of a transitive verb (oversimplification alert). The doubling of the marker is important because such doubling is extremely rare for items that show the typical behaviors for independent words. Also rare-to-unheard-of for independent words is the ‘vowel harmony’ effect whereby the vowel qualities of the case-suffixes are partly determined by the final vowel of the stem, so that ‘kangaroo-waving’ would be ‘wawirri-kirli’.

So, wrapping this up, the reason for the difference in space-inserting behavior between one and two-syllable case endings is probably that the two-syllable case-markers pick up a phonological structure called a ‘foot’ which means that they have an independent stress pattern:
‘k*urlu’ where ‘*’ means that the following vowel is stressed
(a source on Warlpiri stress is http://ses.library.usyd.edu.au/handle/2123/383). But the single-syllable affixes are ‘strays’,
not having their own foot, which perhaps gives them less subjective salience and perceived independence from the material that precedes them.

So, style of grammatical combination is one possible difference between being a ‘word’ and being ‘part of a word’ (an affix), but phonological autonomy is another, partially independent one.

You can’t always trust native speakers about their own languages any more than you can trust the orthography. For example, it’s clear (to me) that French has a certain word, which means ‘what’ (or rather has some of the meanings of that English word), is pronounced /’kεskə/, and is spelt ‘qu’est-ce que’ (with spaces) … except that when the following word begins with a vowel, the final ‘e’ is replaced with an apostrophe and the space between it and this following word is removed! Orthographically, that looks like something between two and four words, and different French speakers will give different answers, but it functions as a unit (like the English word ‘what’ does).

Since the French all learn in school what counts as a ‘word’, and the schools teach French based on grammatical concepts that go back to Latin and Greek, all the natives of France will have opinions on what counts as a word that are influenced by what the Latin and Greek grammarians thought. But I have no idea what if anything the Inuit thought about ‘words’ before Western linguists came along! They might have had very interesting ideas that the linguists discarded because the linguists thought they knew better.

It’s a very difficult issue, because most languages appear to have an intermediate level of unit between ‘phrases’ and ‘morphemes’, but we linguists have not managed to come up with a really solid theory of what it is. It might, for example, be a combination of factors that tend to coincide but not always. A general indication that words sometimes really exist is that in ancient Mediterranean writing systems, when people wrote on rock (plentiful medium, no copy editors in the vicinity), they often put dividing symbols between things that we would tend to regard as words: you can see some for Latin in the slides here:

(but the Greeks, exceptional as always, were rather inconsistent in their use of word-dividers (Powell 1996, Homer and the Origin of the Greek Alphabet, iirc). When the medium gets expensive, such as parchment in the middle ages, the word-dividers disappear.

Another, presumably related one, is that in many languages, there appears to be a minimum size of unit that it is reasonably easy to get native speakers (including non-literate ones) to repeat and talk about as units, even when these clearly from an analytical point of view contain smaller units. An early discussion of this by one of the greatest field-linguists and general thinkers about language of all time is Sapir (1921) Language, p. 34, Granada pb edition of 1970. A way to test this empirically for Inuit might be to try to get children who haven’t learned to read and write yet to repeat in isolation some of the ‘nominal modifier’ suffixes discussed in http://homes.chass.utoronto.ca/~cla-acl/actes2011/Compton_2011.pdf, and also some of the ‘adjectival verbs’ discussed in the same source, and compare the results.

Another, more abstract one is that putative word structure can be described as buildings ‘stems’ or ‘words’ by combining stems, words, and affixes to make bigger stems or words. So, in Modern Greek, the word ‘nihtolouloudo’ “night-flower” is formed by combining the stem ‘niht’ (night), the compound linker -o- and the stem ‘louloud’ (flower) to form the stem ‘nihtolouloud’, which then takes the ‘thematic vowel’ -o (different from the compound-builder) to form the word. Interestingly, the word for flower is ‘louloudi’ with a different thematic vowel, which is usually but not always replaced by -o in these compounds (which are productive in the sense that people can make them up pretty freely). Angeli Ralli has a nice discussion of these formations at http://philology-upatras.gr/files/content/OUP_Paper.pdf. Greek has a clear stem-word distinction, while English doesn’t.

Stems/words can also be formed by others by ‘derivational morphology’, wherein an affix is attached to a stem or word to produce another (stable, stabilize, destabilize, stabilization, destabiliation, re-destabilization) (this is what goes over-the-top in Inuit-Yupik, according to the standard view, and by inflectional morphology (the things that interact with the syntax, more or less), but the key thing here that the potentially recursive categories are limited to stems/words of particular parts of speech.

In syntactic combination, on the other hand, words form ‘phrases’ by combining with other phrases of different types, which can usually have their own complex and sometimes recursive structure, so that the simplest English sentence formula that any significant utility would be S = NP V (NP) where the NP’s themselves are made up of further phrase-types, e.g. ‘a very large dog chased a small cat with a silver bell on its collar’. Linguists try, with limited but not absolutely zero success, to describe phrase-structure with the ‘X-bar theory’, which bascially says that a word of a given part speech will be surrounded by pretty much the same collection of satellites in any grammatical position where that word can appear, but the X-bar theory doesn’t apply inside words, where the words or stems don’t take their usual grammatical satellites, but can only get combined into other bigger words are stems by a different collection of principles.

And, to finish this off, the Inuit/Yupik word-formation processes seem to fit the recursive morphology pattern of deriving stems (a.k.a. ‘bases’) from stems by recursive suffixation, topped off by inflection and perhaps some ‘particles’. A recent discussion of this across the family is:

Returning to the light side, it is rarely observed that English speakers have north of 20 words for ‘nothing’, if you include the major regional variants. Alongside of basic ‘nothing’, Karen deClerq lists nineteen so-called ‘squatatives’, and misses some important ones such as UK ‘sweet FA’ and American English ‘nada’ (which I would count as a now genuine English squatative borrowed from Spanish). So what does having 20+ words for ‘nothing’ say about us …

Thanks for all these references; I will work my way from the less serious to the more so. Actually De Clercq says a ‘squatative’ is ‘a class of taboo words that can be used to express negation’, which would rule out ‘nada’ or ‘zip’. I don’t see why taboo-ness is grammatically important, but it certainly gives her an excuse to write a paper that’s funny to read!

By some strange coincidence, for a limited time you can hear Kate Bush’s song “50 Words For Snow” on National Public Radio. It’s pretty fun: Stephen Fry, under the alias of Prof. Joseph Yupik, recites 50 increasingly fanciful words for snow in various languages including Klingon, while Kate Bush does a count-down and occasionally interrupts to cheer him on:

Is the grammatical structure of these inuit suffix-stacking “word” constructs more like a tree or more like a…, well, stake ?

In least (somewhat rhetorical) questions

(0) Can any “word” always be extended with further suffixes ?

(1) Do all initial segments obtained by cutting off a trailing group of suffixes from a complete “word”, themselves form a complete “word” ?

(2) Are there (eg grammatical) selection rules that limit the choice of the next suffix (to then belong to some sort of “grammatical suffix category”) ?

(3) What are good examples of 2-word sentences ?

Kallalisut “word” construction as pictured is also reminiscent of RPN which is a very economic form of programming languages grammar (or more exactly a non-grammar). Ominously, RPN stands for Reverse Polish Notation – but I expect HP handhelds to have primed you to it in the late seventies. The main surviving niche for RPN is the postscript language for printers…

Your questions are very good, and they’d be more than merely rhetorical if someone here knew the answers. Unfortunately it will probably take a month or two before an expert on linguistics comes by and answers them. I wish experts on every subject would read this blog, so all questions were instantly answered!

It’s pretty fun to read if you like grammar; the strange title is actually a sample of the author’s odd sense of humor. Maybe the new presidents of Italy and Greece should read this article so they can set up trade relations with the Inuit in Canada! (I have never heard the word ‘technocrat’ used as often as in the last few days.)

(3) What are good examples of 2-word sentences?

Actually this leads to the question I’m most confused about: are there ‘sentences’ as well as long ‘words’, or are sentences just special cases of words? After all, if you have a ‘word’ for “I never said I wanted to go to Paris”, which you do in Inuktitut:

parimunngauniralauqsimanngittunga

why do you also need a ‘sentence’ for this? But maybe they have both; I don’t know.

Unfortunately Mallon’s article doesn’t cover this; he’s mostly interested in how pieces of a word are stitched together and pronounced (morphophonology). He does, however, discuss some tantalizing ideas about a ‘duality’ between nouns and verbs in Inuktitut: they’re treated in a more symmetrical way than in Indo-European languages!

Thank you. I’ve read Mallon’s paper with more attention following your advise. The obvious next step (“inuktitut for non-technocrats” ?) should be to adopt an inuit wife for her beautiful linguistic intuition – I wonder how many fell for it already.

On the (fascinating) matter of “verbs-nouns duality” I’d separate English from other Indo-European languages (although I don’t know that many). Not only does English allow nouns to metaphor for verbs, it exhibits an emerging “s-invariance” that applies to 3d person phrases like “the wheels turn” and “the wheel turns” which form an awkward but possibly workable sub-language of similar phrases. What brings dreams of versioning English to make “s-invariance” a true and central grammar rule – say as an inquiry on grammar mutations or as a truly experimental field test of Sapir-Whorf.

(0) no, because some suffixes can’t be followed by any others. But, if you strip off the final stuff, what’s left can be further extended. I’ve also heard some lore that speakers don’t like to change the grammatical category more than two or three times (e.g. to start with a noun, derive a verb, and then get a noun back is OK, but then going to verb again is pushing it.

(1) no, because there is some ‘inflectional’ stuff that has to be there. Verbs for example need to have marking for person and number of subject and object, and various other things

(2) yes, a suffix applies to a stem of a given class to produce one of a possibly different given class.

(3) A basic two-word sentence that cannot be rendered with one word is:

tengmiaq ayuq-tuq
bird.ABS.SG go-IND.3SG’
“the bird went away”

The sole argument of a one-argument verb that is definite must be expressed by a noun-phrase in the ‘absolutive’ case.

I’m not sure what a ‘stake’ is supposed to be, but a grammatical template for describing the non-inflectional suffixes is:

[X noise]Y

meaning that ‘noise’ attaches to something of type X to produce something of type Y. The result could I think be described as a sort of ‘totem pole’ as opposed to a tree. The limitation to one open position seems to be significant (in jargon, ‘there is no compounding’ (although there are a few things that Woodbury wants to describe as compounding for reasons I don’t understand ATM)).

Interesting and problematic things happen with certain suffixes that attach to nouns to produce verbs, where the nouns seem to be able to take modifiers that appear outside the noun:

Ciku-meng atauci-meng ene-ngqer-tua
ice-MOD.SG one- MOD.SG house-have-IND.1SGS
“I have one house made of ice’

Here the verb is formed by attaching ‘[N ngger]V’ to ‘ene’ and then inflecting that for 1st person subject to mean ‘I have a house’, but the house can be modified by two nominals (a word that linguists use for words that act like nouns if they’re not quite sure they really want to call them nouns) in the so-called ‘Modalis’ case. This is example (29) from Woodbury’s Morphological Orthodoxy paper (http://elanguage.net/journals/index.php/bls/article/viewFile/844/732) where there is further discussion. This is another classic problem in figuring out what words are, discussion started in the mid sixties by Paul Postal w.r.t. ‘object incorporation’ in Mohawk (part of a failed attempt to prove that context free grammars couldn’t be made to work for NL grammars, iirc).

Eskimo word structure is well-known for being polysynthetic, meaning that words are constructed of multiple units of meaning, often making them quite long. Languages of this type clearly raise the question, “What is a word?” Although this issue is not trivial even in European languages, Eskimo linguists must use a very different definition of word. There are simple nouns like qimmiq ‘dog,’ but then there also cases where a simple noun is incorporated into a larger utterance, for instance Qimmiqpauŋitchuq “It is not a dog” in North Slope Iñupiaq. Words can be even longer, for example, Miŋuaqtuġviŋmuŋniaŋiñmiuq “He or she also won’t go to school,” derived in a series of steps from the base miŋuk- ‘to color or mark.’ (Miŋuaqtuġvik is school, muk is ‘go to,’ niaq is the future, ŋit is negative, mi is also, and uq is 3rd person singular intransitive.) These complex words – there are 8 morphemes, or meaningful units, in the second word – translate as entire sentences in European languages. They are words in that they are morphological entities, divisible by students of grammar but not usually by native speakers; they are phonological entities as well, subject to sound processes that distinguish one word in a series from the next.

So, he seems to be claiming Miŋuaqtuġviŋmuŋniaŋiñmiuq can be broken down into its components “by students of grammar but not usually by native speakers”, which seems amazing to me.

Though he says “Eskimo linguists must use a very different definition of word,” he doesn’t say what definition(s) they use.

I’ve heard (perhaps wrongly) a sort of operational definition of what a culture considers a “word”: when you interrupt speakers mid-utterance the places they’ll consistently restart from will be what they view as the beginnings of “words”. More polysynthetic languages would be those that support fewer “restart points”.

My use of the put-an-asterisk-in-front-of-the-misspelt-word-to-correct-it-when-you-can’t-edit-your-comment convention demonstrates how you could tell in the case of written language. I wonder how well that convention works in written languages that don’t already settle the question with inter-word spacing. (Chinese is an example that has a large presence on the Internet, but I know very little about Chinese-language Internet customs. Although proper written Japanese has no inter-word spacing either, Internet Japanese is infested with slang kana, which uses it.)

To find out the facts about such matters the current standard approach is ‘Conversation Analysis’, which is conceptually very different from standard linguistics (it was developed some sociologists), and not very well developed for ‘exotic languages’, but both of these matters are being addressed. Here is a blog posting with some entry points:

A standard source of problem-children for the question of ‘what is a word’ are so-called ‘clitics’, which are basically items that combine word-like and affix-like properties in confusing ways. An example is the possessive ” ‘s ” in English, which is pronounced as a unit with the preceeding word, and has same system of alternate forms (allomorphs) as uncontroversial affixes such as plural and 3rd Sing Present -s (the dogz bowl, the cats bowl, the hors@z stall (@ for the ‘schwa’ vowel), but can be tacked onto the end of complex NPs including even relative clauses:

the man who we sold our car to’s swimming pool is nice.

(the traditionally-termed ‘group genitive’ construction)

The requirement that ” ‘s ” constitute a phonological unit with what it follows is the most likely explanation for why the group genitive can’t come after nonrestrictive relative clauses, since these are set off by pauses:

*John, who we sold our car to, *’s swimming pool is nice.

So, the positional properties of this item make it appear to be an independent word in the syntax (technically, a ‘postposition’, comparable to ‘ago’ and ‘notwithstanding’), while its phonological ones make it seem like an affix.

There are ‘enclitics’, which are attached to what they follow, and ‘proclitics’, attached to what they precede, and some evidence that they are inherently confusing wrt word boundaries is that it is precisely with their pro- and en-clitics that the Ancient Greeks were very inconsistent with in their use of word-dividers (when scrawling obscene poems on rocks, according to the Powell book I cited earlier). Currently, this book by Steve Anderson would be the best place to start learning about clitics:

Thanks for all the knowledgeable replies! To clarify my very unscientific anecdote: I’d asked a linguist whether those long German compounds, or those crazy Welsh place names, were actually “words”, or just “phrases with weird punctuation”. The reply I recall was along the lines of “if you interrupt the German they will restart at the beginning, but if you interrupt an English speaker saying an equivalent phrase, they will tend to restart at a more recent word boundary”. That is, apparently the natural large scale “granularity” really does vary with the language syntax, given roughly similar semantics. However I wouldn’t think this interruption technique is a fine enough probe to really resolve questions like whether “of” or “‘s” are “words” in the same way that, say, “off” or “is” or “soft” or “hiss” are.

How To Write Math Here:

You need the word 'latex' right after the first dollar sign, and it needs a space after it. Double dollar signs don't work, and other limitations apply, some described here. You can't preview comments here, but I'm happy to fix errors.