Hi Mark, we have a paper out in Science today. I've attached a copy plus a link to a website where we give a more accessible account of the paper. I expect this will be rather controversial again but we have been very thorough both with improving the quality of the data and with testing the robustness of our geographic inferences.

That link goes to an excellent and freely-available site, which contains essentially all of the information in the Science paper, presented in a more accessible way with additional background. There are even animated maps!

There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.

[These] results may not sway supporters of the rival theory, who believe the Indo-European languages were spread some 5,000 years later by warlike pastoralists who conquered Europe and India from the Black Sea steppe.

A key piece of their evidence is that proto-Indo-European had a vocabulary for chariots and wagons that included words for “wheel,” “axle,” “harness-pole” and “to go or convey in a vehicle.” These words have numerous descendants in the Indo-European daughter languages. So Indo-European itself cannot have fragmented into those daughter languages, historical linguists argue, before the invention of chariots and wagons, the earliest known examples of which date to 3500 B.C. This would rule out any connection between Indo-European and the spread of agriculture from Anatolia, which occurred much earlier.

“I see the wheeled-vehicle evidence as a trump card over any evolutionary tree,” said David Anthony, an archaeologist at Hartwick College who studies Indo-European origins.

I haven't read the new Science paper carefully. So I'll close for now with a partial list of topically- or methodologically-relevant LLOG posts:

I am quoted in Pringle's article, and it is clear from what I say there that I have substantial reservations about the methods (inference piled upon inference) and the conclusions (Anatolian homeland) that have been drawn in it.

I would also like to suggest that Don Ringe be asked directly whether he agrees with the position attributed to him in the final paragraph of Nicholas Wade's New York Times article.

Last, but not least, if Hans Holm is a reader of Language Log or if someone can get in touch with him, he may have something significant to say about the statistical methods employed in the Bouckaert, Gray, Atkinson, et al. paper.

Benjamin Orsatti said,

Is this study a purely linguistic endeavor? That is to say, does it square with the archaeological and anthropological evidence such as potsherds, burial arrangements, horse bones, presence or absence of birch/beech trees, etc?

Is it methodologically sound to rely on an exclusively "linguistic" hypothesis that purports to definitively establish an PIE Urheimat?

Seems as though if you concentrate on the "physical" evidence, you end up with PIE origin between the Don and Dnieper rivers, e.g. J. P. Mallory's hypothesis.

Nick said,

A key piece of their evidence is that proto-Indo-European had a vocabulary for chariots and wagons that included words for “wheel,” “axle,” “harness-pole” and “to go or convey in a vehicle.” These words have numerous descendants in the Indo-European daughter languages

============

So what were the PIE words for guns? ie. How can you differentiate between horizontal transmission of words that went with the technology, rather than leaned from ancestors?

[(myl) This has been a central problem in historical-comparative linguistics since the middle of the 19th century, and is discussed at length, specifically with respect to some of the words relevant to the early history of IE, in the links cited in the original post, especially:

Bill Walderman said,

"How can you differentiate between horizontal transmission of words that went with the technology, rather than leaned from ancestors?"

Isn't the answer that words that have been "learned from ancestors" (i.e., from a common origin) show unique patterns of sound correspondences? Words transmitted from PIE to daughter languages can be identified because they will show the same patterns of sound correspondences among the daughter languages as other words transmitted from PIE. The entire process of reconstruction of PIE is based largely on establishing these sound correspondences among the daughter languages on a scale large enough to convince historical linguists that the daughter languages are derived from a common ancestor. Horizontal borrowings among languages show different patterns of sound correspondences.

David Anthony attempts–convincingly in my non-expert opinion–to show that the words for horse-related and horse-drawn-vehicle-related terms among various Indo-European languages generally exhibit the patterns of sound correspondences that are characteristic of a common PIE origin, not horizontal diffusion. He also attempts to link the linguistic evidence with archeological evidence for the earliest domestication of the horse and the concomitant invention of the wheel in the steppes north of the Black Sea in the fourth millenium BC.

John said,

The Science paper by Bouckaert et al. employs a unique combination of newly generated phylogeographic and previously known linguistic evidence to come to a conclusion that elsewhere has been shown, through a careful sifting of the most salient DNA, archaeological, anthropological, and linguistic evidence, to be spot on. See Sino-Platonic Papers 192, Vol. 1, Ch. 1, "The Interactive Eurasian World, c. 9000-500 BC" (http://www.sino-platonic.org).

On the other hand, it's unfortunate that the NYT article grossly oversimplifies its account of the disciplinary provenance of each of the two best known competing theories of PIE geographic origins. There are countless linguists and archaeologists (as well as historians, anthropologists, geneticists, etc.) arguing in support of either the Pontic-Caspian or Anatolian-Syrian theories of PIE geographic and temporal origins. Indeed, the best argument for the Anatolian-Syrian PIE origins c. 8000-10,000 ybp can be constructed by considering together the work of linguist D'iakanov (1984, 1988, 1997) and anthropologist-archaeologist Renfrew (1988) while also carefully sifting through the many types and deep layers of evidence that can be drawn from other relevant fields. The fresh evidence published by Bouckaert et al. seems to contribute a significant new perspective that supports the Anatolian-Syrian origins of PIE c. 10,000-8000 ybp.

[VHM: The commenter is John Didier, author of SPP 192. He has authorized me to add this note.]

MattF said,

Knowing nothing about this, I guess I can ask the dumb question: Is there some reason they can't both be true? E.g., Anatolian farmers, between 10,000 and 5000 years ago, migrated en masse to the Caspian steppes, and then, acquiring horses, axles and wheels, were equipped to disperse?

Pete said,

@Prof Mair: I don't think there's any suggestion that the Anatolian hypothesis means the Hittites should be equated with the Proto-Indo-Europeans, is there? They lived a few thousand years later, which is consistent with their arrival in (or return to) Anatolia around 2000 BCE.

I get the impression that Hittite is phonologically very conservative, preserving PIE's laryngeals for one thing. But I'm not sure what that means for their relationship with the Proto-Indo-Europeans, other than that they're closer in time than any other historical Indo-European people.

J.W. Brewer said,

One point the interesting (and non-gated) FAQ doesn't seem to answer is why a methodology developed to model/reconstruct the geographical spread and mutation of viruses is useful here in the first place. There would seem to be an implicit minor premise that the "evolution" and geographical spread of languages is pretty much the same sort of phenomenon as the "evolution" and geographical spread of viruses (are the human carriers equally hapless/unwitting in both instances?), which doesn't seem to me self-evidently wrong but also sort of seems like it needs to be argued for more explicitly as actually correct and compelling. The late novelist William S. Burroughs (who was not trained as a historical linguist afaik) famously proclaimed "language is a virus from outer space," and Laurie Anderson turned that claim into a song that some of us may remember having heard during the 1980's, perhaps in a state of other-than-sobriety – but that's not what I would consider a full demonstration of why the analogy should be considered persuasive.

Harold said,

Greg Morrow said,

Anthony's wheel-horse-language book is a really convincing synthesis of the linguistic and archaeological evidence for the Pontic-Caspian steppe. (As Anthony points out in the Times article, the wheel-based words pretty well have to limit PIE to not much earlier than their invention.)

And while I'm convinced that character-based phylogenetic trees are the correct tool for inferring biological relationships, and an adequate tool for estimating divergences, I am wholly unconvinced they will work at all for human culture, which is not in the least limited to lineal descent.

Dimitri said,

The question of IE origins fascinated me as an undergraduate, and to a lesser extent when I was in graduate school, but as a professional academic (I work on the Greek Bronze Age) I find this research question quite boring. It seems to me an extension of 19th-century questions that revolved around issues of race and population movements, both of which were seen as engines of history. That's why Renfrew's thesis has such appeal — he links the PIE question to the spread of domestication and agriculture and he consequently answers the why and how of the spread of IE, questions that are actually interesting to 20th and 21st century archaeology.

emacsomancer said,

Are they trying to make their website look like it's from 8000-9000 years ago too? I mean, who use embedded Quicktime these days? (And for what it displayed, which I could only see after jumping through 45 minutes of hoops to get Quicktime to play, animated gif would have worked as well…)

I've used similar methods in evolutionary biology where afaik they were invented. (They're used for all organisms, not only viruses.) I'm not a linguist, but it seems to me that an essential assumption does not carry over into linguistics. Language is not passed on via a fairly immutable template like DNA.

I worked on some parasitic plants which, for various reasons, can have an extraordinarily high rate of mutation. If that's not taken into account, it plays absolute havoc with the resulting phylogenies.

So the linguistic analysis would have to include some kind of estimates of mutability and to account for it in the analysis. I can't imagine how you could make that estimate without being a (linguisitically inclined) fly on the wall during the thousands of years in question. Do linguists have an approach for that issue?

[(myl) These (Bayesian phylogeny/phylogeography) methods do not depend on fixed rates of change, as old-fashioned glottochronology did, but they do incorporate assumptions about distributions of rates of change, and perhaps assumptions about distributions of rates of change of rates of change (i.e. constraints on how rates of change may change over time along a particular phylogenetic pathway). The probability distribution over phylogenetic trees obviously depend on these assumptions, as do the distributions over time depths for various changes. I'm not familiar with the algorithms used to introduce spatial hypotheses into the calculations, but I assume that the same remarks apply.

I expect that the authors tried a number of ranges of values for these meta-parameters, and have something to say about how robust their results are to such manipulations.

If I were trying to test your worry, I'd use synthesis of fake data as a method. That is, I'd start with a generative model of trait evolution and diffusion that is reasonably plausible but very far away from the assumptions built into the algorithms these authors used, run that model to generate some synthetic data, and see whether their algorithm nevertheless correctly recovers the (known) history.]

J.W. Brewer said,

One of the tricky things is that the ability to speak a particular language is not in the strict sense transmitted genetically but people tend for obvious reasons more often than not to speak the same native language as their parents and grandparents. Except for when they don't (because of emigration/conquest/children being given up for adoption/the unpredictable consequences of mixed marriages etc/etc/etc). The tendency of language to be partially but often significantly correlated with DNA w/o actual causation at the genetic level (so e.g. Basque-speakers still supposedly have some genetic distinctiveness at a statistical level compared to nearby groups of Romance-speakers) might mean that methodologies useful for working on DNA-caused traits might seem to work on language. Except for when they don't.

Harold said,

The comparative method in philology was supplemented by physical evidence from archeological (inscriptions) and written records, The comparative method in evolution also needs to be confirmed by archeological and other physical evidence. Perhaps the comparative (historical) method is no longer interesting to researchers precisely because it covered so much ground and yielded so much.

Mary Kuhner said,

It's worth noting that inference of biological phylogenies also grapples both with "borrowing" (transfer of genetic material among unrelated species, in the biological case) and with hybridization. Borrowing is particularly prevalent among bacteria; it is estimated that E. coli replaces 1% of its genome with alien material every million years.

In both bacterial genetics and linguistics, as I understand it, there are efforts to establish lists of (genes/words) that are not normally transferred, and either weight those most highly or restrict the analysis to them. Clearly you would not get a good tree by considering commonly borrowed words or antibiotic resistance genes. (What is an example of a commonly borrowed class of words? As an English speaker I have little perspective, since English steals everything in sight. Animal names? Foods?)

Scientific/technical terms are frequently borrowed. In English, most of ours are Graeco–Latinate or Germanic, with the balance between those classes varying as you move between physical sciences, mathematics, and philosophy. The Japanese case was touched on recently. Conversely, Swadesh lists —which attempt to capture lexemes resistant to change— have been around for a while. However, there are well-know problems with the Swadesh lists; and even if the lists themselves were unproblematic, the approach itself is problematic— as creolists and contact linguists can readily explain.

One significant issue of linguistic transmission is the fact that the components (lexicon, phonology, syntax, morphology) are, in principle(?), transmitted distinctly from one another. The components mutate at different rates, are differentially susceptible to borrowing, can be hybridized, etc. Language is not a unitary thing, it's very much a gestalt organism. I'm not aware of anything in genetics that has a truly similar transmission structure. Perhaps something looking at the co-evolution of entire ecologies might work (rather than looking at the evolution of individual species/familytrees)— but I've no idea what sorts of methods ecologists use, or if they even ask these sorts of questions.

We tend to think of creolization as a modern phenomenon, what with the aftereffects of colonialism et al. But given that the dispute is between PIE originating from the spread of agriculture vs the spread of militaristic conquest, I don't think we should be so hasty to rule out this sort of thing.

Chris C. said,

The popular press had this story yesterday. While I didn't (of course) trust it to be either accurate or exhaustive in how it described the methodology, it seemed to me that as described there would be an inherent bias toward Anatolia simply because that was the home of the first literate IE speakers. Wouldn't that tend to push the result toward that region whether or not the Hittites were, as was pointed out above, intrusive to the region, and even if they arrived from some place far removed?

DCBob said,

I find these attempts to apply phylogenetic and phylogeographic techniques to linguistic data very exciting, but what story is consistent both with the tree shown in Figure S1 in the supplemental material and with the archaeological evidence? As I understand it, farming reached or developed in Iran and India well before 7000 BC. Anatolian farmers reached Greece before 7000 BC, the Balkans by 6000 BC (early Starcevo), Spain (along the Mediterranean coast) and Germany (Linear Pottery) by 5500 BC, and Ireland by 5000 BC at the latest. (An independent farming culture developed along the Dniester around 5800 BC, influenced by farmers of ultimately Anatolian origin, but Bug-Dneister culture is not descended from the Anatolian culture.)

In contrast, the tree has Anatolian breaking away from the root around 6500 BC. It has Tocharian and Armenian breaking away after 5000 BC – 2000 years too late, if the diffusion model is correct, right? It has Celtic/Italic/Germanic/Balto-Slavic breaking away from the rest around 4500 BC – 1000 years too late if farmers brought the language into northern Europe. It has Greek breaking away from Indo-Iranian around 4000 BC – doesn't that have to be several thousand years too late if you accept the hypothesis that the Indo-Iranians began to diffuse from Anatolia at around the same time the Greeks diffused in the other direction? It has Indic and Iranian breaking apart around 3000 BC – that seems to be 1000 years too early, and seems inconsistent with the evidence that Aryans moved into India from the north, bringing cattle with them. It has Irish and Welsh breaking up around 1000 BC, when, if I understand the archeology correctly, we think the Celts were still in Austria and Bavaria. So I just don’t see a story that consistently relates this tree to what I know of European and Middle Eastern archaeology.

It seems to me that nearly all those problems more or less disappear if you posit that the tree roots around 4000 BC rather than 6000 BC, and that the big dispersion occurs on and around the steppe zone around 3000 BC rather than around 4800 BC. (That also seems consistent with the earliest date in the posterior probability distribution of the root, if I'm reading Figure S1 correctly.) Now you have a plausible story in which Anatolian breaks away as the first semi-nomads move into the Thracian plain, displacing the farming cultures there. Tocharian and Armenian break away around 3000 BC; so Tocharian can plausibly be represented by the Afanasevo culture in the central Asian steppe. Celtic/Italic/Germanic/Balto-Slavic breaks away from the others around 2800 BC, plausibly represented by movements up the Danube and Bug river valleys into northern and central Europe at about the time cultures like the Corded Ware culture appear. Greek breaks away from Indo-Iranian around 2500 BC, plausibly involving the Catacomb culture. Indic and Iranian break apart around 1800 BC, plausibly dispersing out of the Poltavka-Sintashta-Andronovo cultural sequence. The western European groups can plausibly develop out of central European bronze age cultures influenced by the Yamna movement into the Danube. Irish and Welsh can break up a few centuries before the Christian era. I suppose none of these stories are strictly provable, but they all seem to link the linguistic and archaeological evidence in a much more plausible way than anything I can imagine for this new tree. Does that make sense?

Actually, I think that the Mycenean Greeks show up in mainland Greece at least a couple of hundred years earlier.

"It has Tocharian and Armenian breaking away after 5000 BC"

Of course, the Tocharians don't actually arrive in the Tarim Basin until about 2000 BCE. So, you have to have a scenario that fits the intervening 3000 years of isolation to make it work. DC Bob is very much on track in noting how many other historically known calibration points are missed.

If you are going to be a Baysean statistican and use prior probability distributions, then, by God, be a Baysean statistican and use as many firm calibration points as possible from past research by other methods when you do your modeling, and you don't have to be genius historical linguistics scholar to come up with a lot of data points that are pretty solid from material culture archaeology, historical sources and known linguistic familiy relationships. (Razib Khan notes the absurdity of the dates assigned to the Roma divergence in the model). If your model can't fit the data points we already have accurately, then you know that you are wrong and need to go back to the drawing board before trying to publish work that reaches rubbish conclusions. The deepest flaw in the model in addition to its failure to use enough prior probabilities when they exist is its failure to adequately account for differential rates of language evolution attributable to language contact, substrate influences and language differentiation, each of which drive above average rates of language evolution relative to periods when a language is isolated and has little contact with other highly distinct languages.

A good Bayesean model would start with a framework of known archaeological cultures that could have different linguistic affiliations from each other and a lot of historical data points on dates and places where we know languages were spoken or not spoken, and would then assign probabilities to the IE or non-IE (and IE subfamily) linguistic affiliations of various archaeological cultures. For example, we can be almost certain about which archaeological cultures were associated with Celtic languages, but Urnfield and Bell Beaker are much closer calls and quantitative efforts to estimate their linguistic affiliation probabilities would be helpful.

Bill Walderman said,

@John Burgess: If that is the case it would undermine David Antony's arguments. But I question the reliability a Saudi newspaper's account of an archeological find suggesting earlier domestication of the horse in, or all places, Saudi Arabia. Even the New York Times can't be trusted to get these things right. And the body of the Saudi article mentions a statue of a horse, not a domesticated horse–it's not even clear that it has been convincingly established that the statue is a horse and not some other equid–perhaps hunted for meat rather than domesticated.

YM said,

@DCBob: Very nice. I don't know the archaeological chronology, but if you are right, then Atkinson's et al.'s model appears at least self-consistent. All it needs is a different average rate of divergence.

Shivraj Singh said,

@DCBob: "It has Indic and Iranian breaking apart around 3000 BC – that seems to be 1000 years too early, and seems inconsistent with the evidence that Aryans moved into India from the north, bringing cattle with them."

Couple of points:
1. Bos Indicus was domesticated in Indian sub continent.
2. Indicus was traded in large numbers to African continent during Indus civilization time.

What evidence exists for "Aryans moved into India from the north, bringing cattle with them"?

If they were coming into India should'nt they have arrived with Taurine cattle? We don't find any Taurine cattle in India.

Howard Oakley said,

I am familiar with these techniques as used in biological taxonomy. The interesting difference between their application is that in taxonomy people are most interested in the relationships between current species, e.g. whether a given species is a member of the same genus as another. As the data are densest for modern species, and the amount of projection least, I think the techniques give the greatest confidence there.
In this case, the purpose is not to study the relationships between modern languages (and the Frisian and other issues apparent would be interesting!), but to look where the data are most sparse, and the projection/prediction greatest. In other words, the matter of most interest is where the uncertainty is greatest. It is like walking out on a very long plank – the further you get, the greater the need for its rigidity and stability!
That said, it is fascinating. Whichever hypothesis is favoured, they both have a big problem in the region of the Caucasus – so close to either postulated origin of IE, but with so many non-IE languages still being spoken today. But that is another story altogether.
Howard.

Pflaumbaum said,

Postulating the geographical origin of the Indo-European family is one thing, but this always seems to slip into talking about reconstructed PIE as a language that was spoken at some point. But as my old prof James Clackson put it:

"Reconstructed PIE is a construct which does not have an existence at a particular time and place (other than in books such as this one), and is unlike a real language in that it contains data which may belong to different stages of its linguistic history. The most helpful metaphor to explain this is the ‘constellation’ analogy. Constellations of stars in the night sky, such as The Plough or Orion, make sense to the observer as points on a sphere of a ﬁxed radius around the earth. We see the constellations as two-dimensional, dot-to-dot pictures, on a curved plane. But in fact, the stars are not all equidistant from the earth: some lie much further away than others. Constellations are an illusion and have no existence in reality. In the same way, the asterisk-heavy ‘star-spangled grammar’ of reconstructed PIE may unite reconstructions which go back to different stages of the language. Some reconstructed forms may be much older than others, and the reconstruction of a datable lexical item for PIE does not mean that the spoken IE parent language must be as old (or as young) as the lexical form."

A fascinating discussion. I see how they might model some of the rate problems, but it also sounds like the fluidity of language goes far beyond what can happen in genetics. (Luckily. I doubt it would be healthy.) I don't know if it's enough to violate the assumptions of the model they're using, but it does sound like it's at least pushing the boundaries.

Very interesting, too, that there are points which disagree with known archaeological data. I assume the authors must know that? I'll have to go read the article to see how (if) they explain that.

wren ng thornton: "Perhaps something looking at the co-evolution of entire ecologies might work (rather than looking at the evolution of individual species/familytrees)— but I've no idea what sorts of methods ecologists use, or if they even ask these sorts of questions." I'm not an ecologist, but as far as I know, no, they don't make trees of the coevolution of ecologies. The complexity of multiple relationships, multiple variables, unavoidable interpolations and assumptions would scare the pants of them.

J.W. Brewer said,

Further to Peter C's "parochial" comment about the questionability of where Frisian is located in the tree, I see that the same tree (if we're both looking at figure S1) has Polish stuck in with what are conventionally called the East Slavic languages (Byelorussian/Ukrainian/Russian) rather than in its standard place with the West Slavic branch (Czech, Slovak, Lusatian). I'm no Slavonicist, so does this odd-seeming location reflect a respectable minority-revisionist view among specialists, or is it simply an error ("sheer ignorance, madam," as Dr. Johnson supposedly said in explaining having bungled a particular definition in his dictionary) that increases the odds that those of us who know more about language than the sophisticated mathematical modeling techniques being used will write the whole thing off as garbage in, garbage out? It has been noted in threads on other sites that the date the tree shows for the divergence of Romany from the rest of Indic may be on the order of 2,000 years too early (the variability of their rate-of-change metric apparently having not been flexible enough to account for the particular known historical experience of Romany-speakers) and the date for the divergence of proto-Romanian from the rest of proto-Romance takes one side in a highly contentious debate that is mixed up with an unscholarly overlay of Balkan nationalism(s).

I wonder if any of these four points were raised in whatever peer-review process Science used before deciding to publish, and, if not, then what minimal level of knowledge of historical linguistics Science thought was necessary for the referees it used? (I suppose I should note the possibility that the text of what's hidden behind the paywall is more nuanced in its handling of these four errors or disputable points than the free-to-everyone tree diagram that seems on its face to include them.)

DCBob said,

Shivraj Singh – I'm no expert but my understanding is that we trace the Indic languages to steppe cultures north of India in the late 3rd – early 2nd millenium BC: see for instance http://en.wikipedia.org/wiki/Indo-Iranians.

Nathan said,

What are the Anatolian hypothesis proponent's explanationt to the following :
If PIE arose in Anatolia then how do they explain the Hattic substratum in Hittite?
Wouldn't the Mitanni empire be IE speaking (as opposed to Hurrian speaking) since it covered some of Anatolia and had an Indo-Aryan elite.
IE is supposed to have arose in Anatolia and it spread as far as Southern Asia and Atlantic Europe yet it did not dominate the Anatolian region in antiquity even while we see that Hattic and Hurrian (non IE) were dominated by IE speaking cultures in antiquity.

naddy said,

I'm looking at the 2011MCCtree_widthCognateRate figure where the "rate of evolution along each branch is represented by the thickness of the branch, where a thicker branch implies faster evolution".

The results are distinctly odd.

My starting point was French, which you'd expect to sit at the end of a thick branch from Latin, but instead the lines are relatively thin. If you look at that subtree, though, you see that the Romance languages form a sister clade to Latin and it is Latin that is marked as evolving quickly. And this is not an isolated occurrance. All over the tree, if there is a conservatice language with innovative offshoots/close siblings, it is the the conservative one that gets the thick branch: Old Irish, Gothic, Old Persian, Avestan, Ancient Greek, etc.

Peter said,

@goofy: the note below the map, beginning “NB: – This figure needs to be interpreted with the caveat that we can only represent the geographic extent corresponding to language divergence events,” answers your question, I think. Roughly, their analysis provides minimum ages, which can be underestimates for various reasons. The blue in Iceland means that their analysis shows IE has been in Iceland at least 500 years. We know for other reasons that it’s actually been there longer, but the map shows just the results of their model.

Craig said,

In the FAQ, the authors claimed "the cognate data we use excludes known cases of borrowing". However, it seems like most of the obvious errors could accounted for by failure to remove loans.

I'm wondering if known borrowings were excluded when the authors had an English-language dictionary with etymologies available, or if they used somewhat odd criteria for determining what counted as a 'known case' of borrowing.

Most of these cases seem fairly trivial, even though they contradict known historical dates and relationships. But the date of divergence assigned to Tocharian and Armenian seems significant.

Cy said,

@Pflaumbaum
Very true – and something that all popular (newspapers and magazines) treatments of the concept always completely ignore.

It always seemed the only way to keep from devolving into pointless discussions about proto-world in HL classes was to emphasize the whole set-theory aspect – the classifications of arbitrarily similar sound change groups, like Grimm's law etc. My profs always tried to make abundantly clear that the asterisks meant that what followed was NOT the actual sounds of the language, but just a handy representation using characters easy to find on a keyboard, arranged categorically and, like you mentioned, representing various theoretical depths. Most students didn't absorb these repeated lessons. Popular science writers, especially. It's one of those concepts that the human mind resists violently.

princenuadha said,

I've used similar methods in evolutionary biology where afaik they were invented. (They're used for all organisms, not only viruses.) I'm not a linguist, but it seems to me that an essential assumption does not carry over into linguistics. Language is not passed on via a fairly immutable template like DNA.

I worked on some parasitic plants which, for various reasons, can have an extraordinarily high rate of mutation. If that's not taken into account, it plays absolute havoc with the resulting phylogenies."

Thanks for that insight. So how well do the methods work in evolutionary biology for finding the source populations location when most of the early descendants are unknown and the later descendants only descend from a fraction of the earlier ones?

Basically, I'm wondering how well the method used in the linguistic model would work even if we didn't have to worry about lateral transmission, ie sharing.

Nelson said,

"Postulating the geographical origin of the Indo-European family is one thing, but this always seems to slip into talking about reconstructed PIE as a language that was spoken at some point."

@Pflaumbaum, I'm actually not a fan of that quote by Clackson, although it is a useful comment on current practices in IE studies. There actually ought to be a coherent PIE language, which is properly the last common ancestor of all the later IE languages. Any features arrived at by strict comparative reconstruction of all the IE branches (or, strictly speaking, by comparison of the two earliest sub-branches of PIE – not that we know for sure what the internal structure of IE was) belonged to the parent language.

The confusions come two places. One is reconstruction based on just a few of the branches of IE, where it's unclear whether the feature goes back to PIE, or to dialectal developments. The unclear status of Anatolian makes this especially confusing, since there's now doubt about whether the, say, verbal systems of Greek and Indo-Iranian reflect a dialectal ('core IE' or whatever) development or what. Some reconstructions based on part of the family may be younger than PIE proper.

The other is internal reconstruction, which can potentially get at features that predate PIE.

Without Anatolian, there would be no doubt that PIE itself had a tense-aspect system basically like that of Greek or Indo-Iranian (though there would still be good reasons to suppose that pre-PIE had had a simpler tense system). A fairly well-acknowledged example is the practice of writing *eʜ₂, *eʜ₃ for what are strictly speaking *aʜ₂, *oʜ₃: the colouring of *e by laryngeals is generally agreed to have already occurred in PIE. Somewhat more complicatedly, the accent-ablaut classes are generally given in fairly idealized forms, sometimes explicitly recognized as 'pre-PIE', but other times not so clearly distinguished. A lot of the attempts popular over the 20th century to, say, reduce the PIE case system drastically were especially striking examples of internal reconstruction applied to pre-PIE, though it was usually claimed that it was 'PIE' under discussion (again, under the view that PIE is imply the conglomerate of features which make the linguistic patterns least messy).

But these problems aside, there's one clear fact: comparative reconstruction can't arrive at a linguistic state older than PIE proper (the last common ancestor). You can't triangulate to a star older than PIE itself (though you might use internal reconstruction to guess at the older stages, as is done with the idea that PIE was once ergative).

Pflaumbaum said,

Interesting. I've been out of the game too long so will have to think it through.

Suppose in a hundred years, BrE and AmE are in the process of diverging. One feature of this is that the phoneme /t/ has been lost, becoming, say, /ɾ/ in AmE and /ʔ/ in BrE. Subsequently, some new phonological changes – /u/ > /y/ – take place across both.

Centuries later, linguists reconstructing ancient “English” from later extant “American” and “North European” succesfully retrace /ɾ/ and /ʔ/ to /*t/. They also reconstruct original /*y/ from what are now American /i/ and North European /e/.

Both these reconstructions are correct. And yet /t/ and /y/ did not coincide in English.

Isn't this also the implication of Garret's work on Mycenaean and "Proto-Greek"? That dialectal convergence plays as big a role as splitting and the singular mother-language as node between branches is an illusion? Or am I hopelessly confused?

J.W. Brewer said,

goofy/peter: similarly, one of their animations of the spread shows IE arriving in Iceland before it gets to "mainland" Scandinavia. Which is backwards by, what, probably at least a thousand years? My guess is that this is because Old Norse is older than e.g. Swedish and they semi-arbitrarily decided to treat ON as located in Iceland rather than, say Sweden because they had to associate it with some geographical point more localized than its entire historical range. But it's just bad PR to have a model that is sufficiently abstracted from reality to be observably inaccurate for the last two millenia where the history is known quite well and then ask the reader/viewer to accept that it's the best possible model for describing what happened at much greater time depths where we don't really know what happened and are trying to choose among competing conjectures. Even if the surface anomalies in more recent times are mere side-effects of individually defensible simplifying assumptions of the sort any workable model needs to make, how are we to assess whether similar unintended side-effects in the more remote past might materially skew the bottom-line results?

Nelson said,

@Pflaumbaum, you mean Old English *y? You won't (I'm pretty sure – it'd be interesting to see a hypothetical data set proving me wrong!) arrive at that by direct comparison of British and American English (now or in a century). You might get that the new /e/ and /i/ go back to *i (as in evil), which would be entirely correct – but you won't be able to separate out earlier *i and *y by comparing the later dialects (e.g. separating out the *i in evil and weasel into OE yfel and weosule).

Maybe a very clever bit of internal reconstruction might uncover i-umlaut, and link that to some instances of *i, but that's exactly the kind of temporal uncertainty you always get with internal reconstruction (as opposed to comparative reconstruction). Dialect convergence and the like mean that direct comparative reconstruction might not necessarily be reaching back to the age of the proto-language (e.g. the common example of comparing modern house and German haus if we lacked older records or other Germanic languages) – but I don't think that method can reach back earlier than the proto-language proper.

Army1987 said,

Army1987 said,

(Looking at the tree…) What? There's no way Sardinian split off that late. Phonologically, it's pretty much the most conservative Romance language (e.g., the only one I know of where Latin /k/ stayed /k/ before front vowels).

Pflaumbaum said,

Yes, what Army 1987 said, the GOOSE vowel, in my made-up example. I put it needlessly complicatedly though, the 'modern' reflex could still be /y/ in both languages, with nothing to tell you that this had developed from /u/ well after the disintegration of /t/. The problem being that there was no unitary 'English' that diverged in a single event, there was splitting and convergence.

In Garrett's study, as I'm sure you know, Mycenaean certainly looks 'Greek' in terms of its lexicon, yet hardly seems to have partaken at all in the morphological and even phonological changes from PIE that are characteristic of the Greek dialects. So the idea of a singular 'Proto-Greek' that splits into Mycenaean and the later dialects seems untenable. Doesn't that also put into doubt a spoken PIE containing all the features reconstructed to it?

Nelson said,

With Greek, there was already good reason to suppose that a lot of innovations (like the elimination of labio-velars, some of the palatalizations, etc.) were dialectal, implying that Proto-Greek was remarkably archaic with the dialects developing distinctly but along similar lines. Mycenaean confirms and expands this, but doesn't necessarily destroy the idea of Proto-Greek altogether.

I remember Mark Hale saying in a lecture once, commenting on Garrett's paper, that while the list of innovations leading to Proto-Greek might be 'dangerously small', it only takes 'just one' innovation to set Greek as a unitary branch of IE. Generally I've found Hale's views to be very clear regarding subgrouping. The point is that reconstructing Proto-Greek based on comparison of the dialects yields something different from PIE, set off by at least one innovation. At the very least, there's certainly room for debate about whether a large amount of convergence really invalidates a tree model (the presence of shared innovations after divergence is obvious – we call it borrowing if the languages are distinct enough – and doesn't necessarily mean we can't use trees). Henry Hoenigswald has those nice little tree diagrams with the 'cross hatched' bits near the top of every branching – I sort of feel like that cross hatching is the part that causes the most methodological and theoretical problems.

Convergence is definitely a real methodological problem. I still take issue with Clackson's star analogy, though. For one thing, convergence still doesn't affect the terminus post quem of comparative reconstruction: features cannot predate the proto language. It does lead to problems on the other end, as forms which might seem common to PIE might be younger, dialectal developments, and so not actually part of PIE proper at all.

At least in some contexts, careful attention to relative chronology can sort out convergence from common development. Anglo-Frisian is a nice example. Patrick Stiles has a very nice paper arguing against an Anglo-Frisian unity, which shows quite clearly a lot of features, even including the fronting of Germanic *a to *æ, almost certainly postdate distinguishing innovations in the two languages (such as the distinct monophthongizations of *ai to OE *ā, OFris. *ē, which must have been among the very earliest distinct developments in both languages). Even without careful attention to chronology, the fact that the details of many 'similar' developments in the two languages were different was a good sign that they weren't developments in a proto-language.

Even with all that, there are still two innovations that are plausibly common to 'Anglo-Frisian': the rounding of nasal *ã(:) and the subsequent fronting of *æ: (Anglo-Frisian mōna 'moon', Old Saxon māno; OE dǽd / OFris. dēd, Old Saxon dād). Since these changes have to predate the OE monophthongization of *ai to *ā, the first distinctive innovation of that branch, we would seem to have two features separating out Proto-Anglo-Frisian from Ingvaeonic. It's maybe a little uncomfortable to only have two features setting off this branch, but if we really believe these two features (and that they predate all distinguishing features, as does seem to be the case), then the existence of Proto-Anglo-Frisian seems probable.

Well, I can attest to the stir this article is making–I've already been contacted by my son (a physicist/software engineer) and an undergraduate at Idaho (who, last I knew, was some sort of Ag major). I have not had time to really think deeply about the issue in light of the article–and so far I've only seen the conclusions. If the data they used is somewhere, I haven't found it. Nor have I found their "protocol" for using the data, i.e., how are the data chosen/sifted before they're put into the mathematical formulae? So I list below some (not quite random) issues where I would want further information/justification. (And, no doubt, I'll think of more.)

1. What counts as lexical preservation and what counts as replacement? For example, do German Haupt and English hound count as preservations or do German Kopf and English dog count as (divergent) innovations? From what little they say, it would seem that they think of words as if they were biological genes, either always and everywhere present in the original form or always and everywhere present in a mutated form. But most words are polysemous (and polysemous in different ways in different languages) and in one meaning may suffer replacement, but in another not, and thus a simple yes/no (1/0) is not robust enough to fully explain the situation.

2. I love the ani-maps! How can I do them myself? But this one seems a little weird (or unexpected). On the basis of historical data (not inferred prehistorical data), we know Cyprus was Greek-speaking (from at least the 8th c. BC?) and certainly I would expect the area covered by Slavic to extend much further east by, say, 500 AD. These may be, probably are, just quibbles, but it underlines for me the lack of any understanding about how their data "drew" the map. They admit that their process misses Celtiberian and the like–presumably because Celtiberian, while indubitably IE, simply provides no usable data to put into their schema.

3. This may be a purely mathematical issue that has a mathematical solution, but how does their protocol handle data sets (i.e., languages) for which there are not attestations of certain words in their list? (Incidentally, is it a Swadesh list of some sort?) Certainly Hittite and Tocharian had some missing data points. Others out of the two hundred must have had problems of the same sort. A similar problem is like Armenian (and to a lesser extent Albanian and English). In these cases we know all the words to fill in the list, but we also know that these languages have undergone massive (in the case of Armenian, near total) lexical replacement. Does the fact that English-speakers talk about animals and colors, both borrowed from French (rather than deer and hues) have any bearing on English's ultimate historical placement in the great IE scheme of things? Again, maybe their system somehow deals with this issue, but, if so, the way of dealing is not explained.

4. In many ways the family tree they draw up is pretty standard (the position that Albanian and Armenian take in it are quite unexpected though [and that may be an artefact of the lack of data in the form of inherited words in those languages]). But it's hard to "map" that tree against the presumed spread of agriculture out of Anatolia. If the Proto-Indo-Europeans spread into Greece and the southern Balkans, and then in the central and northern Balkans, and then into Central Europe, etc., the deepest cleavages in the tree should be between the Anatolian languages on the one hand, and Greek and all other IE languages on the other, and then next between Greek on the one hand, and Albanian and all the rest on the other, and then next between Albanian on the one hand, and Italic, say, and all remaining others, etc. Also difficult is the deep separation in their tree of Greek and Indo-Iranian. Certainly on all other data except the lexicon Greek and Indo-Iranian belong together, but their theory makes them separate at the earliest time-depth.

5. Absolute dating is an issue too. If PIE spread with the introduction of agriculture, it had to have happened far earlier than, say, the invention of wheels. But the wheel vocabulary is widely spread in Indo-European, in a way that suggests to me that PIE was already dialectally divided in significant ways as wheels were developed. Thus the terminology for wheels and wheeled vehicles is not pan-IE, in just the same way that British and American terminolgy for cars and trucks (= lorries) is not always "pan-English" (windshield/windscreen, hood/bonnet, boot/trunk, etc.),

6. I've always found Renfrew, et al's suggestion that PIE spreads with farming a very enticing one. It would explain so simply and easily why IE languages were so "successful." But the timing of the putative spread and the known internal relationships within IE both speak strongly against it. I don't reject their math (but I don't know anything about it either), but I do have to wonder whether languages (and their individual speakers) are sufficiently like viruses that the same math is equally valid a descriptor/predictor for both phenomena.

So there's my early thinking on the topic. We'll see if it changes in any way.

Thanks very much for your comments, which capture many of my misgivings also.

On your point 4, isn't the position of Celtic and Italic as a very late split also anomalous? How do you get a kentum language splitting off after the satem shift without inheriting it? Or am I missing something?

Also, on your point 6 about the applicability of a virus-based model of geographic spread, I find that in particular a problematic aspect of the Gray-Atkinson approach, since viruses don't move with intentionality, and human migrations tend to be very patterned and intention-based. Human spreads are like targeted streams, not like waves, because humans choose geographic targets and skip large areas to reach them–unlike viruses. Later migrants then follow first migrants to those specific places, rather than moving randomly. The Relaxed Random Walk model they used assumes movements at the front of an expanding wave that are random in regards to direction. I don't know to what degree their logarithm permits variable-distance movements, but I doubt that it permits long-distance leaps like the movement that brought Athabascan languages into the Southwest; or the movement that seems to have brought steppe Yamnaya people to the Altai. It permits over-sea movements, which must be long-distance, but then it doesn't permit additional long-distance leaps to specific targets over land, as far as I know. The direction of geographic movement is constrained in their logarithm by the positions of some daughter languages in some ancient records or in modern times, but they do not rank these possible locations according to archaeological data for movements in that direction from the presumed earlier edge of the wave–for archaeological plausibility, in other words.

Furthermore, I don't understand why every iteration of the model kept returning a geographic root in Anatolia. They said it wasn't just because Anatolia is in the center of the IE distribution so would make the most efficient starting point for an expansion that operated through random movements, but I don't understand what other aspect of the Anatolian starting point makes it the geographic root. It seems to me that it might be a combination of the efficient-center-random-walk process, strengthened by the constraint that Anatolian is the earliest branch, and additionally strengthened by the decision to map the Anatolian languages in Anatolia for the purposes of the geographic-spread model. But then the the Anatolian root would be an artifact of assumptions inherent to the model, particularly the prior mapping of Anatolian in Anatolia. But don't most Anatolian linguists consider the Anatolian languages as probably intrusive in a non-IE Hattic-speaking Anatolian environment? To the extent that the geographic rooting is determined by the prior decision to place Anatolian in Anatolia, isn't that decision responsible for the Anatolian root?

Finally, it is obvious that archaeologists and linguists already talk past each other in a way that is only occasionally enlightening; and adding disease-diffusion biologists to the mix just makes everyone more confused. Perhaps this is how integration between separate fields begins, awkwardly, but we have not yet arrived at anything like a synthesis.

Nathan said,

Re. Doug Adam's below statment:
"6. I've always found Renfrew, et al's suggestion that PIE spreads with farming a very enticing one. It would explain so simply and easily why IE languages were so "successful." But the timing of the putative spread and the known internal relationships within IE both speak strongly against it."

IE is quite dominant in South Asia but widespread agriculture was already well established in the Indus Valley prior to Indo-Aryan intrusions (1st half of 2nd millenium BC).

If IE speakers were primarily farmers then isn't it odd that the Rig Veda discribes a pastoral rural society; no mention of large cities or irrigation works.

Trond Engen said,

Adams' point that the Anatolian origin is indeed alluring but not possible to square with current evidence is worth repeating. "Yes, it's the first thing that strikes us all. Yes, I'd love it to be true. No, it doesn't fit." And it's annoying that the "mounted warriors" keep coming back whenever someone argues against the mainstream view. There were probably widespread shirmishes when pastoralists entered the lower Donau Valley and put a definite end to Old Europe, and a lot of violence when Indo-Iranians shut down the BMAC, but there's not that much evidence for violent upheaval in Western Europe (Indeed, much of the argument for the Renfrew history, and for that even deeper fantasy, Paleolithic Continuity, is the lack of evidence for violence), so the prime mechanism is rather access to the prestige and riches from long distance trade. Not all peaceful, but not destructive either.

–

I'm as positive to the steppe hypothesis as the next guy, but the fairly obvious fact that Indo-Iranians came from the north as steppe pastoralists doesn't necessarily imply that the (oldest stage of) Proto-Indoeuropean was spoken by steppe pastoralists. It's the most parsimonous hypothesis given the current evidence, but it's not the only one worth considering. A Gamkrelidze-Ivanov-ish origin with the Maikop culture in or south of the Caucasus or an origin with the Tripolye-Cucateni complex in Eastern Balkan are conceivable within (or around the edges of) the general steppe framework described by Mallory & Adams and David Anthony. Or at least that's something that struck me while reading their books. Not that it would change much more than when and how Anatolian split off.

–

I'm not sure if the random walk model is that bad — at least not as a first approximation in lack of other evidence of migration paths. That might depend on the type and amount of variation built into it. But as everybody says up there, there are many possible calibrating points along the way, of more or less certainty. One obvious point is the early — and repeated — contact with Uralic. I'm not sure how close to the root of the tree this has to be placed — but I think not long after Tocharian branched off. It would be interesting to see how different the pattern would turn out just with that contact added.

My immediate objection to the model is that language communities aren't points on a map, they're regions. There are certain limits on their size in different environments. Their movements are less like a walk and more like a shifting pattern of fields or maybe of bacteria in a limited space. Or maybe of fungi with mycelial cords growing in all directions. They may split and eat eachother, but they don't rise ex nihilo. Such a model should provide a certain resistance when a language pushes on another and a plausible replacement when a language moves out. But since I know nothing of the actual model and have no competence in any relevant area, I'll finally shut up.

Pita Kelekna said,

Is not the Elamo-Dravidian language family distribution clear evidence for an ancient non-Indo-European spread of agriculture from the Near East to the subcontinent. Plus RgVeda documentation of later (c 2000 BC) pastoralist-with- domesticated-horse invasion by IE speakers from the north. This demarcation between north and south/ Indo-Aryan and Dravidian speakers plus accompanying starkly contrasting kinship systems is overwhelmingly present in the subcontinent today.

Otto Kerner said,

In the scenario you describe Am.E /ɾ/-> *t /y/ <- Br.E /e/ is an illusion created by the coincidence of the same development occurring separately in two different dialects. No reconstruction is perfect. There is still a hypothetical protolanguage although our knowledge of it might be imperfect.

But there is still some complexity in the idea of a proto-language as far as dialects separating from each other. Suppose that the late PIE community broke up into two dialects which were largely isolated but were still sporadically in contact. 100 years later, the speakers of one dialect find it necessary to move further away, so now they are not in contact at all. However, after another 100 years, both groups have expanded to the point where they are now in contact again. Over the following 100 years, they interact routinely and their languages experience dramatic contact effects. However, after that hundred years a group of young people from one of the clans successfully colonises a distant area, causing a permanent split in the speech community. Now, we could say that PIE per se is the original version before any separation occurred, but shared development continued for hundreds of years after that.

In the rough-and-tumble of the archaic human environment, there were probably a lot of situations like the above happening, except more complicated.

Otto Kerner said,

Sorry, that got a bit mangled by the punctuation I put in. It should say that the reconstruction of *t in proto-English is valid in your scenario, but the reconstruction *y is an error, albeit one that our hypothetical researchers without access to attested proto-English would never be able to detect.

J.W. Brewer said,

I wonder if there's an easy way to split up their results into smaller pieces by branches of the tree and knock out the non-living languages. How definitively does the model situate proto-Romance in Italy? In Latium? Similar questions for proto-Germanic, proto-Slavic, etc. (and those you can test at various points in time since Germanic and Slavic have both gained and lost some ground within Europe over the centuries where we have a decent historical record), More generally, is there some more limited dataset where we already know what happened historically well enough to see whether the whole virus-spread model is a good analogy w/o having to tweak parameters implausibly hard to reconstruct the known urheimat answer from the current-geographical-distribution starting point.

Once you get post-1492, I suppose it becomes hopeless. How plausible is it that someone could come up with a model for reconstructing languge diffusion that worked backwards from the present worldwide distributions of English, Spanish, and/or French outside Europe to indicate which of them started where (with a greater degree of precision than than "somewhere in Europe with an Atlantic coastline") on a better than chance basis? Could you distinguish between the relative plausibility of an Andalusian urheimat theory and a Liverpudlian urheimat theory for worldwide English at a time depth of barely four centuries? Now obviously long-distance ocean travel introduces new degrees of complexity in any model, but this brings us back to the question of how confident we can be that we really know enough about the ways in which human populations moved/interacted and languages moved/interacted (which are correlated but not identical phenomena) six or ten millenia back to be able to model them at all with whatever degree of precision is needed for this sort of output to be meaningful.

Shivraj Singh said,

"DCBob said, August 25, 2012 @ 10:32 am
Shivraj Singh – I'm no expert but my understanding is that we trace the Indic languages to steppe cultures north of India in the late 3rd – early 2nd millenium BC: see for instance http://en.wikipedia.org/wiki/Indo-Iranians."

Bob I am not sure if the data supports tracing of Indic languages to steppe cultures.

Reasons are:
A) It it being assumed that steppe dewellers were pastoralists and they moved into India with their cows.
B) Since steppes had Taurine cattle and Indian cattle does not show any admixture of genes from Taurine it rules out the arrival/presence of Taurine in India.

What I do not know the answer to is:

If pastoralists did undertake a journey from steppes to India how did they survive on the way without the cattle?

Trond Engen said,

Shivraj Singh: If pastoralists did undertake a journey from steppes to India how did they survive on the way without the cattle?

Who says they came without cattle? It's quite conceivable that their steppe cattle were replaced by those native to India shortly after they arrived. Productivity under local conditions and resistance to disease are strong incentives for a cattlebreeder.

Could this imply that they actually did not come at all?

Nah. But it could imply that their cattle didn't come. But then we're back to invaders or skilled mercenaries installing themselves as a ruling class, similar to what one thinks happened in Mitanni. But the large-scale language shift would seem to take a larger — and probably also female — population.

Shivraj Singh said,

"Trond Engen said, August 28, 2012 @ 3:18 am
Who says they came without cattle? It's quite conceivable that their steppe cattle were replaced by those native to India shortly after they arrived. Productivity under local conditions and resistance to disease are strong incentives for a cattlebreeder."

What you are suggesting is that all the taurine that steppe dwelling aryans came with:
i) got wiped out,
ii) left no genetic trace in Indian cattle

It is possible that there is some non-zero probability for this thesis but does not seem probable.

"Shivraj Singh: Could this imply that they actually did not come at all?"

"Trond Engen said, August 28, 2012 @ 3:18 am
Nah. But it could imply that their cattle didn't come. But then we're back to invaders or skilled mercenaries installing themselves as a ruling class, similar to what one thinks happened in Mitanni. But the large-scale language shift would seem to take a larger — and probably also female — population."

In the case of Mitanni we do know that there was a language shift. Do we have similar evidence for the steppe dwellers displacing an existing language (group) in India?

We don't know from any direct evidence that the there was a language shift among the Hurrian elite prior to the rise of the Mitanni empire — but it's a very reasonable inference.

The positing of a language shift (not necessarily a population replacement) in northern South Asia and South-west Asia originating from an influx of peoples from west-central Asia is likewise an inference, supported by many strands of evidence, to the extent that it's become, more or less, the scholarly consensus.

A brief outline of three of the most important of those strands is can be found here:

This is not true (probably ancient part). With the success of Green revolution in India in 1960s, NDRI was given the charter to increase milch production. They imported bulls from the west and east and created semen banks. Even today pretty much all of semen administered by Govt doctors in villages is a hybrid semen.

Kumar et al allude to this in their paper:
"It is worth noting that there may have been attempts to improve some modern Indian cattle by very recent importation of B. taurus individuals (Felius, 1995)."

So any gene related to the male side is not going to give correct result.

"Taurine cattle from Europe and Africa and zebu cattle from India and Africa were compared at the molecular level. Indian zebu cattle have been found to have profoundly different mtDNA control- region sequences when compared with both European and African taurines and African zebu. The sequence divergence is consistent with a shared common ancestor of the order hundreds of thousands of years ago."

Thus there was no taurine ingress in India with the steppe aryans or later. It only happened in last 40 years or so under Govt of India edicts.

Shivraj Singh said,

We don't know from any direct evidence that the there was a language shift among the Hurrian elite prior to the rise of the Mitanni empire — but it's a very reasonable inference."

Are'nt Hurrian names very different from Mitanni names? I think it is almost certain that Mitanni ruled over a people who spoke a different language.

"Matt_M: The positing of a language shift (not necessarily a population replacement) in northern South Asia and South-west Asia originating from an influx of peoples from west-central Asia is likewise an inference, supported by many strands of evidence, to the extent that it's become, more or less, the scholarly consensus.

A brief outline of three of the most important of those strands is can be found here:

Trond Engen said,

The Mitanni elite was likely descended from military specialists or mercenaries, quickly absorbed in all ways but onomastics and some specialist vocabulary. A destiny similar to the Norse settlement in Normandy. There was a rapid spread of chariot warfare across the Middle East and as far as Nubia at just the right time. This sort of dynastic takeover isn't sufficient to explain Indic dominance in Northern India. Or else China would have been Mongolian or Manchu.

Since Indic managed to gain foothold on the Indian subcontinent, the situation there must have been different, with another form of intrusion, possibly including women and families or something similar to the ttrade/prestige explanation suggested for Western Europe. I don't know if there's evidence either way.

Pita Kelekna said,

There appears to be roughly contemporaneous evidence for cattle domestication across the Middle East: Bos taurus in Fertile Crescent c 8000 BP; Bos zebu in Mehrgarh c 7000 BP; and possibly even earlier Bos domestication in Egypt c 9000 BP. Of course, wheat and barley agriculture was anciently practiced across this wide area. So no one is suggesting the Indo-Aryans were the first to introduce cattle to the subcontinent. But like their linguistic relatives the Iranians, Indo-Aryans c 4000 BP were mobile agro-pastoralists, who the RgVeda attests regularly sacrificed bulls, rams, goats, and horses to their gods. The belt of closely related Iranian and indo-Aryan languages across the Middle East into the subcontinent suggests an expansion of these IE agro-pastoralists southward, likely resulting in fragmentation of indigenous Dravidian: in the north, Brahui in Iran, Afghanistan, and Pakistan; Kuruk in Central India; Malto in East India; and reduction of the Dravidian bloc to the southeastern portion of the subcontinent. The failure of western linguists to decipher the Harappan script might also be due to this language being indigenously Asian, rather than Indo-European.

"additionally strengthened by the decision to map the Anatolian languages in Anatolia for the purposes of the geographic-spread model. But then the the Anatolian root would be an artifact of assumptions inherent to the model, particularly the prior mapping of Anatolian in Anatolia. But don't most Anatolian linguists consider the Anatolian languages as probably intrusive in a non-IE Hattic-speaking Anatolian environment? To the extent that the geographic rooting is determined by the prior decision to place Anatolian in Anatolia, isn't that decision responsible for the Anatolian root?"

Excellent point. In support of a non-Anatolian root for Anatolian dialects, one of the contemporaneously identified culturally distinctive elements of Hittite culture which has archaeological support as well, is that the Hittites were well advanced relative to their neighbors in metallurgy and that they were jealous of their secrets. (Similarly, the Cemetery H culture in NW India associated with early Indo-Aryan and with early Rig Vedic texts that historically document the inhumation to cremation transition in burial practices there, was also the first to have some advanced metallurgical relics in South Asia). These developments appear in both Anatolia and India almost simultaneously. The metal working practices in both places are derivative of techniques that first appear in the Caucasus mountains, and only secondarily in Anatolia.

J.W. Brewer said,

@ohwilleke: in fairness to the authors, they say in their FAQ that they reran their model with all the dead languages excluded (thus excluding all of the Anatolian ones as well as Old Norse, Avestan, etc.) and got similar results. But it's also useful to recall that this is a statistical argument. Even if their model were a good one, it's not one that will yield a unique reconstructed point of origin. Their claim is that if you turn the crank a hundred or a thousand times, the model will spit out a hundred or thousand different potential points of origin, no one of which is ex ante any more probable than any other one, but many many more of which are in Anatolia than on the steppes (although the model certainly did generate a few possible points of origin on the steppes, as well as a second-only-to-Anatolia cluster in the southern Balkans). But human history is not the result of the ex-ante-more-probable thing happening at every crucial branching point.

Readers of this blog might be interested in what ancient DNA seems to suggest. Only half a year has passed since ancient autosomal DNA (all of it pre-5ka) has started to be published, but there are certain patterns that appear to emerge. I cover some of these in this blog post, including links to various original studies.

We now have DNA from a couple of Neolithic "farmers" (Gok4, a TRB individual from Sweden, and Oetzi, the Tyrolean Iceman), as well as several hunter-gatherers (Ajv52, Ajv70, Ire8 from Gotland and Bra1/Bra2 from Mesolithic Iberia).

The European hunter-gatherers appear to be outside the modern range of variation. The Swedish hunter-gatherers seem to have contributed a larger part of DNA to living inhabitants of the Baltic region. Oetzi is most like modern Sardinians and Gok4 is most like modern southern Europeans. The data such as it is suggests a clear signal of population movements across Europe, rather than a model of "acculturation".

The more interesting pattern, however, at least in my opinion, emerges when one tries to analyze the ancient DNA data in reference to the modern populations. When admixture analysis is performed in modern West Eurasians, three very distinct West Eurasian components usually emerge:

"Southern": prevalent in southern parts of the Near East, North Africa and Southern Europe
"Atlantic_Baltic": prevalent in Europe from the Atlantic to Russia and beyond
"West_Asian": prevalent in northern parts of the Near East, the Caucasus, Anatolia, Iran, and the major West Eurasian element in the populations of South Asia.

Now, when I examined the ancient individuals, the hunter-gatherers invariably showed ~100% ancestry in the Atlantic_Baltic component. The farmers, showed mixed Southern/Atlantic_Baltic admixtures, and their ratio of Southern/Atlantic_Baltic is very similar to modern Sardinians.

However, the third component, the West_Asian (which occurs at about ~10% even in places like Ireland, Norway, the Orkneys, etc.) seems to be missing in the pre-5ka samples. Not only that, but it also contrasts Basques (who have almost none of it) with their Iberian and French neighbors, and in the Baltic it reaches minima in Finns, with higher values in both Scandinavians and Balto-Slavs. I think this pattern may be consistent of groups of PIE speakers from West Asia relatively late in prehistory (although when exactly they came to the Balkans remains to be seen, due to lack of ancient DNA samples from that area)

On the basis of this data, it seems to me that "something" happened post-5ka in Europe that brought this "West Asian" component into Europe. An issue that remains to be determined is when this component made its first appearance in Europe.

Various lines of evidence suggest that there have been significant post-5ka events in European prehistory. For example, three of the more numerous present-day Y chromosome lineages in living Europeans (J2, R1a, and R1b) are so far lacking in pre-5ka samples. The Neolithic population of Ukraine, on the basis of a couple of mtDNA sample sets from Mariupol-type cemeteries, seems to have been a mix of West and East Eurasian lineages, and this pattern seems to extend through Siberia all the way to the Tarim basin. Many samples from that region are often labelled "Caucasoid" on the basis of pigmentation and morphological traits, but the evidence suggests a mixed East/West Eurasian population all the way to the doorsteps of Europe.

My own opinion (for which I argued in the linked post and various others) is a variant of the Gamkrelidze and Ivanov model: PIE originated in Neolithic (perhaps the Halaf culture as originally proposed), some of it may have spread early, but the major pulse of expansion corresponded to the spread of metallurgical innovators out of the highlands of West Asia during the Copper and Bronze Ages.

I still indeed see – beside some progress in the easier parts of the subgrouping – several basic mistakes.

1. The first step: The phylogeny.

1.1. Gray said that they tested the robustness of their program. In fact they did. However, only by inserting random errors, which naturally yield only small variations.

1.2. He will further counter that they tested three other roots. In fact they did, however, not the deciding one between the western (Cel-Ita, Gmc,Bal-Sla) and the eastern branches (Balkan-Grp, Ind-Ira, and, after my computations, Hit-Toc).

1.3. And they in fact on the one hand openly (in the main article!) admit that the (prejudiced) root has then been intentionally inserted in the middle between Hittite and its split-off point, subsequently telling their audience that this has been the outcome of their computations. This alone is sufficient to reject the whole approach!

1.4. The main error lies in the belief and employment of roughly the same decay rates for all languages. They may assert that they applied gamma-smoothing to handle different rates between some languages, but in fact their assertion rests upon the basic assumption of all glottochronological approaches, decay rates with acceptable variations. Note that Bergsland & Vogt (1962), have not been convincingly disproved up to now (I just submitted a special paper on this, checked with the best experts of Scandinavian etymology).

Because, additionally, Hittite and Tocharian have gaps in the word lists, and (with Albanian) many more question marks in the etymologies, we must see the consequences (scholars as Sheila Embleton or Sergej Starostin strove for years to find a solution to this problem; the Auckland-Team leaves it to the program): For the algorithms, these languages must appear as having decayed much faster than the other ones, or, by denying a faster decay, they end in longer branchings, i.e. a split that is much too old. It is not possible to correct this anomaly, because we only have control spaces for decay in some other languages, not so for Hittite and Tocharian. On the contrary, the unnaturally old terminations of these languages even prolong the dates of splits, because the calculation of decay starts, e.g. for Hittite, 3,500 years ago!

2. The result don't accord with geographical and historical reality.
This software, given the presumably too old dates for the locations of Hittite and Tocharian, can only and automatically result in Anatolia as the Urheimat. The algorithms – in accordance with the customary diffusional model of virus epidemics – would never be able to detect a homeland as, e.g., postulated for Indo-Iranian by Carpelan et al. in the upper Volga-Ural area! Impossible! Upon my inquiry, one of the authors responsible for this study answered that the software yields the most plausible results in accordance with the data employed. Nobody doubts that!

3. Dienekes holds that, "For example, three of the more numerous present-day Y chromosome lineages in living Europeans (J2, R1a, and R1b) are so far lacking in pre-5ka samples."

To MattF:
Knowing nothing about this, I guess I can ask the dumb question: Is there some reason they can't both be true? E.g., Anatolian farmers, between 10,000 and 5000 years ago, migrated en masse to the Caspian steppes, and then, acquiring horses, axles and wheels, were equipped to disperse?
_______________________________
Not a dumb question at all. Cavalli-Sforza, who's known for crossing between gene- and language-studies, proposed something exactly along those lines!

Ben said,

Vain idea of Indo-europeans spreading agriculture with them as we know well that the first known indo-europeans – Greeks,Hittites appeared among the farmers of Hatties-Hurrites,Pelasgians-non indo-europeans over 4000 bc much later than the neolithic agriculture there at 7000 bc.

Florian Blaschke said,

Pflaumbaum: In fact, the most recent common ancestor of American and Standard British English can be located very precisely in time and space: It was the dialect that Shakespeare used in his writings (not his native dialect, but the most prestigious dialect at the time), and that the King James Version was written in (a somewhat more archaic register of the language, however, perhaps comparable to poetic registers of ancient IE languages), i. e., the dialect spoken in London ca. 1600 AD. Other dialects of English spoken at the time, even closely related ones, did not directly contribute to either American or Standard British English, except via lateral influences (borrowing, and perhaps substratal influences), so American or Standard British English cannot be construed as directly descended from any of them, but they certainly existed and part of them have direct modern ancestors (such as the dialects of Southwestern and Northern England), which are also subsumed under "British English", though severely marginalised nowadays and strongly influenced by Standard British English. (The most recent common ancestor of Australian and Standard British English was apparently spoken in London in the 19th century.) This is similar to the relationship between Latin – originally the dialect of a single city, Rome – and closely related other languages of ancient Italy – Faliscan, Sabellic and other Italic languages –, which were all eventually absorbed by the spread of Latin-derived Romance. If we had no knowledge of them, any conjecture of "Para-Romance" languages would be purely hypothetical, but the existence of such para-varieties would be virtually certain. Analogously, we have to suspect that Proto-Germanic, Proto-Italic, Proto-Indo-European and whatever proto-languages can be reconstructed were spoken by a locally quite confined speech community, even if we cannot recover the precise place and time, and surrounded by or at least lying adjacent to closely related idioms. In fact, that means that Proto-Indo-European, possibly the dialect of a single settlement which spread as a lingua franca due to its prestige, was most likely part of a "Macro-Proto-Indo-European" dialect continuum (which may have been spread throughout an area as large as the Ukraine), with more or less closely related languages more or less nearby (Uralic may have been part of the related but not very closely related languages), which make its initial spread much easier. This also allows us to postulate that the homogeneity of reconstructed Proto-Indo-European and other reconstructed proto-languages is not just an artifact of the comparative method used to reconstruct proto-languages, but reflects a plausible prehistoric reality consistent with what we know about modern and historical languages (such as Ancient Greek) and can infer for the period PIE was spoken in, without violating the Uniformitarian Principle by postulating such absurdities as completely uniform proto-languages which were spread over huge areas for centuries, or alternatively problematic scenarios such as proto-languages which had (internal!) geographic variations which we cannot recover.

Serious limitations to our knowledge and to the accuracy of our reconstructions no doubt exist, in principle, but our reconstructs can be thought of as approximations to a reality which did formerly exist, just as our reconstructions of more or less fragmentarily attested languages such as Gothic or Gaulish. Nobody would say that written (Old, Classical, Post-Classical, Late or whatever) Latin is merely a formula for Romance sound-correspondences only because we cannot recover the phonetic reality underlying the symbols completely or with absolute certainty. Just because a language is not known in all of its phonetic details does not mean that it is only a "formula" and did not have a real, historical existence. (May I note that even the pronunciation of the variety of English that Shakespeare wrote in is not known in all of its details with certainty, and competing proposals exist!)

(Also, of course, American English and Standard British English in their non-artificial, everyday spoken contemporary forms offer no clues as to the former existence of the pronoun "thou" in their most recent common ancestor, but "para-varieties", such as the aforementioned rural dialects of Southwestern and Northern England, would clue us in that such a pronoun formerly did exist in English. Similarly, whether a mechanically reconstructible verbal root did exist in PIE is never obvious; it can be taken as certain that many verbal roots are completely lost or only remain present in traces – such as isolated nominal formations – that cannot be interpreted clearly anymore, other verbal roots may be attested only in a single branch and further roots are neo-roots formed or analogously abstracted after the breakup of PIE, but the state of the art as represented by the LIV is subtle enough to reckon with and account for such complexities as much as possible, marking roots with limited attestation or other problems – circumstances that give reason for doubt as to their existence in PIE – as uncertain and noting when there is reason to suspect secondary character of a formation or root. In general, however, when a root appears sound and structurally conforms to what PIE is believed to have been like, limited attestation does not mean it cannot be reconstructed as plausible for PIE anyway, just as nouns that lack any irregularities or other markers of possible secondary or foreign origin may be reconstructed all the way back to PIE.)

When the effects of laryngeal colouring are omitted in PIE reconstructions that can be taken as reconstructions being merely notated on a morphophonological level, by the way – this is nothing specific to PIE or proto-languages but can done, and is done, even with modern, directly attested and thoroughly analysed languages. Especially, but by no means exclusively, in orthographical representations: just think of how German spelling deliberately ignores the effects of final devoicing.

Florian Blaschke said,

Florian Blaschke said,

By the way, I can see the appeal of Renfrew's original hypothesis, at least in principle, as it straightforwardly linked the spread of IE to the spread of agriculture (as clearly observable in the archaeological record), at least in Europe, but in the meantime he has responded to criticism of his hypothesis (for all its neatness, it just doesn't work out, and the way it requires near-stasis for millennia until the onset of history, by which everything changes completely all of a sudden, strikes one as incredibly contrived and as requiring special pleading) by modifying it (and thus complicating it considerably) to incorporate the Kurgan/steppe model into a confusing and even less convincing hybrid. This hybrid seems to be mostly ignored, though, or at least has not seen nearly as much reception and popularity.

I would like to point out that even rough, impressionistic centre-of-gravity estimates fail to fit the assumption of Anatolia as a centre of expansion for IE, as most IE branches are located within Europe (or close by), a situation which is only exacerbated when extinct branches only attested in antiquity are considered in addition. In view of this geographic distribution, if Anatolia were the original centre of expansion, the IE tree would have to be rooted in Anatolia (with PIE and Proto-Anatolian being the same language, and other subgroups branching away from inside Anatolian, which means that Germanic or Celtic would be equally closely related or more to Hittite or Luwian than to Greek or Indo-Iranian, and Tocharian could not have branched off that early, either), or languages should be at least clearly more closely related to Anatolian the closer they are geographically. So, even the centre-of-gravity criterion does not fit Anatolia at all, and fits the steppes much better.

Also, the separation of Irish and Welsh can hardly be dated as early as 1000 BC, unless Proto-Celtic is that old (that's at least not implausible, but the date precedes the Hallstatt culture) and both languages separated that early (and then in Central Europe). However, this has to rely on the validity of the P/Q-Celtic model, which has seen increasing opposition lately; Celtiberian and Lepontic are often thought to have branched off earliest now. Peter Schrijver, for one, has pointed out that (save for the P/Q difference) no difference between British and Irish can be demonstrated to be older than the first century AD, and Stefan Schumacher has argued that the P/Q difference, on its own, is fairly trivial (and worse still, Irish and Celtiberian are only linked by a retention here, not an innovation). Let's say that this (consistent) feature of Atkinson's trees does not exactly strike one as confidence-inspiring.

(Besides: That many contemporary archaeologists somehow do not like relatively quick long-range migrations even though they are abundantly attested in the historical record should not be cause to automatically discount explanations that postulate them. In fact, I find it specious to declare the case essentially solved and therefore uninteresting by reference to a scenario which contradicts historical experience, and by discounting a scenario which does not, even if it may sound less attractive to modern sensibilities. History is about what was, not what should have been if people only had been sensible.)

David Marjanović said,

3. the Hittites — based upon archeological and cultural evidence — were intrusive in Anatolia:

Yes, but can't they have come from as close as Azerbaijan?

And while I'm convinced that character-based phylogenetic trees are the correct tool for inferring biological relationships, and an adequate tool for estimating divergences, I am wholly unconvinced they will work at all for human culture, which is not in the least limited to lineal descent.

Biologists have to grapple with convergence (and, to a usually limited extent, lateral = horizontal gene transfer) just the same way that historical linguists have to grapple with borrowing (and, to a much more limited extent, convergence).

Borrowing is particularly prevalent among bacteria; it is estimated that E. coli replaces 1% of its genome with alien material every million years

@Mary

I'm guessing that's far slower and more consistent, thus less problematic, than the sharing that goes on in languages.

That is far slower; I don't know what you mean by "consistent". However, it's just as problematic – because where linguists deal in thousands of years, biologists have to juggle tens to thousands of millions of years. It evens out that way.

A fairly well-acknowledged example is the practice of writing *eʜ₂, *eʜ₃ for what are strictly speaking *aʜ₂, *oʜ₃: the colouring of *e by laryngeals is generally agreed to have already occurred in PIE.

As pointed out above, that's just phonemic vs phonetic reconstruction: /eH2/ was [aH2], and /eH3/ was [aH3] – hidden by the fact that historical linguists, purely for historical reasons, don't use the IPA or any of the conventions around it.

Suppose in a hundred years, BrE and AmE are in the process of diverging. One feature of this is that the phoneme /t/ has been lost, becoming, say, /ɾ/ in AmE and /ʔ/ in BrE. Subsequently, some new phonological changes – /u/ > /y/ – take place across both.

Centuries later, linguists reconstructing ancient “English” from later extant “American” and “North European” succesfully retrace /ɾ/ and /ʔ/ to /*t/. They also reconstruct original /*y/ from what are now American /i/ and North European /e/.

Both these reconstructions are correct. And yet /t/ and /y/ did not coincide in English.

Again phonemic vs. phonetic reconstruction: /t/ and /y/ are coinciding. /t/ and [y] are (mostly) not.

1. What counts as lexical preservation and what counts as replacement? For example, do German Haupt and English hound count as preservations or do German Kopf and English dog count as (divergent) innovations? From what little they say, it would seem that they think of words as if they were biological genes, either always and everywhere present in the original form or always and everywhere present in a mutated form. But most words are polysemous (and polysemous in different ways in different languages) and in one meaning may suffer replacement, but in another not, and thus a simple yes/no (1/0) is not robust enough to fully explain the situation.

It's no problem to code one language as having two or more states of the same character; gene duplication, too, is common and well known. I don't know how Bouckaert et al. coded such cases, though.

how does their protocol handle data sets (i.e., languages) for which there are not attestations of certain words in their list?

The same ways missing data are always handled in phylogenetics.

(Which, for Bayesian phylogenetics, may actually be a bit of a problem; but I forgot the reference for that.)

If IE speakers were primarily farmers then isn't it odd that the Rig Veda discribes a pastoral rural society; no mention of large cities or irrigation works.

Renfrew's hypothesis only concerns the spread into Europe. It accepts that Proto-Indo-Iranian was spoken by riders on the steppe – it just doesn't extend this all the way back to PIE.

3. Dienekes holds that, "For example, three of the more numerous present-day Y chromosome lineages in living Europeans (J2, R1a, and R1b) are so far lacking in pre-5ka samples."

No, the situation doesn't appear there at all. What's shown there is when the haplogroups diverged, not where that happened or when their carriers lived in which places. It's a tree, not a map.

just think of how German spelling deliberately ignores the effects of final devoicing.

3) Only in cases where the "voiced" (lenis) form doesn't appear in any surviving recognizably related words in Standard German. In mit and the 3rd person singular verb ending -t, you'd expect d for historical reasons/comparisons with other Germanic languages, but t is written instead. My dialect actually has /d/ there, though that may not mean anything, see below…
2) It's syllable-final fortition. The lenes aren't merely devoiced (assuming they ever were voiced, see below), they go all the way to fortis, and it happens at the end of every syllable, not just the ends of words; I've heard Sydney turned into [ˈzɪtniː].
1) It's a northern feature that never spread south. I don't know, but it could be a Low German substrate feature. In much of central Germany as well as in eastern Austria, the fortis-lenis distinction has instead been abandoned altogether, leaving voiceless lenes everywhere, and in between those regions (where I come from) the fortes have turned into lenes between vowels and at the ends of words. – As it happens, the theater pronunciation of Standard German was codified in Prussia, so it basically has a High German sound system with Low German sounds… lots of sounds that the High German consonant shift had eliminated, in fact.

David Marjanović said,

Uh… I oversimplified about word-final fortition. Middle High German, a standardized poetic register based on dialects from what is now southwestern Germany, evidently had it, because it spelled it out (b, d, g became p, t, c).

Florian Blaschke said,

David: Ad 3, none of your two examples are effects of final devoicing. Even Old High German (which clearly distinguishes the voiced and voiceless alveolar stops, lacking systematic devoicing in general) has mit and 3sg present -t. Old English mid (still present in Modern English in traces, such as midwife) clearly points to Proto-Germanic *d (> OHG t), although there appear to be cognates in some Old Germanic languages pointing to *þ (> OHG d) as well, while other continuants are ambiguous. As for the 3sg present ending, German continues Proto-Germanic *d, while English continues Proto-Germanic *þ, which is not a problem because the same alternation is found in the 3pl and 2sg as well and it is plausible that Proto-Germanic had both variants, due to Verner's law. Some verb classes were originally stressed on the root, leading to the Verner variants, while others were stressed on the suffix, leading to the non-Verner variants.

Other complications are, for example, Proto-Germanic 3sg *sendi instead of *senþi, which would be the expected continuation of PIE *h2s-énti, but the traditional explanation is that the copula was simply unstressed, essentially clitic. This one is actually one case where Modern German spelling is unetymological! It should be sint, as in OHG, not sind; nothing devoiced here. The same is true for 2sg seid. I'm not even sure why d was chosen here.

Final devoicing in German is a complicated issue. (The merger of voiced and unvoiced stops in German dialects, partly due to the effects of the High German sound shift in Upper German, where it was particularly thorough, and partly due to later developments, especially the so-called binnendeutsche Konsonantenschwächung or intra-German consonant lenition, is essentially a separate issue, but adds to the complexity.) See especially Mihm, Arend (2004), Zur Geschichte der Auslautverhärtung und ihrer Erforschung in Sprachwissenschaft 29, pp. 133–206.

Florian Blaschke said,

David Marjanović said,

Sorry I'm seeing this so late! I've learned in the meantime that you're right about mit; I thought it was cognate to with (with the same /wi/-to-/mi/ change that wir undergoes in many German dialects), but it's not, it's cognate to mid, and an etymological spelling would actually be *mitt, explaining why this word escaped the vowel lengthening of monosyllabic words.

it is plausible that Proto-Germanic had both variants, due to Verner's law.

Ah, good old Verner confusion. *sigh*

In this case, though, Proto-Germanic, Proto-Northwest Germanic and Proto-West Germanic would all have had to retain both variants, with analogical leveling happening only later. Isn't that unlikely?

I'm not even sure why d was chosen here.

Concerning sind, seemingly unshifted nd (and sometimes ld) is common in German: Wind, winden, wenden, senden, Land, finden, und, Feld… for finden I happen to know that both d and t are attested in OHG, and that these are thought to be Verner variants; for the others I only know that OHG appears to have the expected t across the board (hiltibrant enti hadubrant…) and that dialect mixture is always a possibility. Another factor may be analogy from the "excrescent" word-boundary-marking -d in niemand (of Early New High German origin, just the right time to maximally mess with the modern orthography).

Concerning seid, it's apparently an artificial distinction from seit "since". I do pronounce those differently, but that's a spelling-pronunciation – my dialect has /sats/ and /sɛid/, respectively…