Problems in the taxonomy of the Uralic languages in the light of modern comparative studies

0. Despite the recent attempts to undermine the status of the Uralic language family as a well-defined genetic unit, there should be no doubt that its
foundations are as solid as those of other ancient and extensive language families, most notably the Indo-European family. It is actually surprising that it is
possible to acquire such detailed knowledge of the early history of Uralic, given that historical documents appear 3,000 years later than in the case of
Indo-European. Challenges to the concept of Uralic unity seem to derive from poor understanding of the history of Uralic studies in general, or the
characteristics of some Uralic branches in particular. The branch whose status within Uralic is least well understood appears to be Samoyed.

Another contested feature of the Uralic language family, or, occasionally, language families in general, is the applicability of the so-called tree model to
their classification. It appears that while the underlying idea of the tree model continues to be fully valid (Esa Itkonen 1999), its practical applications to
Uralic have often left a lot to be desired. Especially after the analyses by Häkkinen (1983, 1984) it has become increasingly clear that the traditional view of
the interrelationships of Uralic languages depicted by a binary or nearly binary tree is based on much less solid evidence than has been tacitly assumed.
Consequently, it can be suggested that a non-binary family tree (occasionally but unnecessarily referred to as a “family bush” or
“comb”), involving uncontroversial branches only, reflects the structure of Uralic more accurately, especially when supplemented with information on
the areal contacts between the branches (cf. Janhunen 2000: 59–60). The basic idea encapsulated in a tree can obviously be represented in other graphic
forms (e.g. Salminen 1999: 20) or expressed in words (e.g. Décsy 1965: 7–8).

Importantly, the crucial principle in classifying languages is that the features supposed to be characteristic of a particular branch of a language family
must qualify as shared innovations (Campbell 1998: 170; Hoenigswald 1966: 7; cf. Anttila 1989: 303–304). A shared innovation is, to give a short
definition, a change which can be assumed to have taken place once in the relevant proto-language but not in its sister languages, and whose direction can be
clearly stated. Since innovations versus retentions are most readily recognizable in phonological material, historical phonology continues to play the leading
role in the subgrouping of languages. Lexical differences between languages are often open to various interpretations, and in comparative Uralic studies, in
particular, attention should be paid to potential evidence for lexical loss and replacement rather than simply compiling lists of shared vocabulary which are
bound to include large numbers of retentions. In historical morphology, the difficulty in identifying true innovations needs to be taken seriously, while
syntactic changes, even when their status as innovations is undisputed, do not often qualify as shared ones because of their iconic nature or the possibility of
parallel external influences.

To give a maximally clear example of an impeccable application of the tree model within Uralic, there should be no doubt that Forest Nenets and Tundra Nenets
derive from a common source that is properly regarded as a single language of a relatively late prehistoric period and should be labelled Proto-Nenets, that
this language is one of the successors of another language that existed in more remote prehistoric times, known as Proto-Samoyed, and that this language was
preceded by yet another language dating back to several millennia before the present, namely Proto-Uralic. In each case, there are, firstly, internal pieces of
evidence for reconstructing an earlier stage that resembles the other related languages more closely than the later stages of development, and, secondly, a
sufficient number of established innovations that distinguish every proto-language from the preceding, contemporary, and successive ones.

Incidentally, one of the stages traditionally assumed in Samoyedological literature is not present in the above list of predecessors of the Nenets languages,
namely Proto-Northern Samoyed. It has turned out that features common to northern Samoyed languages are few, and that these features are open to interpretations
pointing to secondary language contacts rather than common inheritance. Proto-Northern Samoyed should not therefore be treated as a proto-language distinct from
Proto-Samoyed (cf. Janhunen 1998: 458–459).

The exclusion of Proto-Northern Samoyed from the taxonomy of the Samoyed languages is obviously in accordance with the basic tenet of this presentation
which, in Terho Itkonen’s terms, entails that a proto-language must clearly differ from both its sister languages and its parent language (Terho Itkonen
1997: 236). Northern Samoyed may well qualify as an ‘areal genetic unit’, a concept referring to a group of languages whose similarities derive from
a common predecessor as well as secondary contacts between them (Helimski 1982).

In the standard binary classification, the number of proto-languages between Proto-Uralic and a Saami language, for example Inari Saami, is higher than the
number of proto-languages leading to a Samoyed language. As the first steps, Inari Saami derives via Proto-Eastern Saami from Proto-Saami, while the earlier
stages are known as Proto-Finno-Saami (commonly but confusingly called “early Proto-Finnic”), Proto-Finno-Volgaic, Proto-Finno-Permian,
Proto-Finno-Ugrian, and, finally, Proto-Uralic. Even if Proto-Northern Samoyed is included in the proto-languages between Forest Nenets and Proto-Uralic, their
number is three, while the number of proto-languages between Inari Saami and Proto-Uralic is six. There is no a priori reason to regard such a difference in the
rate of change of languages as impossible, but it must be carefully examined whether all of the traditionally assumed proto-languages qualify as distinct
genetic units, or whether they are either based on very few diagnostic features that do not make them notably different from their parent languages, or whether
the features attributed to them are actually better explained by areal influences.

In the most traditional version of the standard classification, presented by, for instance, Décsy (1965: 7–8), Hajdú (1993: 8), and Campbell (1998:
169), the Uralic language family is described as having a fully binary structure, that is, each of the proto-languages is divided into exactly two branches. The
only exceptions may appear on the lowest levels of classification, for example within Finnic or Samoyed branches. The tree diagram in Hajdú (1962: 59, 1973: 14)
and Hajdú & Domokos (1978: 68) shows vagueness in the placement of Saami, which is, as Décsy (1965: 7–8) correctly points out, mainly due to
non-linguistic considerations. Décsy’s own verbal presentation and Campbell’s tree suffer from certain terminological shortcomings but they need not
concern us here.

It turns out that the binary, or, for that matter, any tree of the Uralic language family includes a number of nodes representing branches that are so
transparent, closely-knit and well-established that they can be immediately and beyond doubt recognized as historical linguistic entities, each deriving from a
highly distinct proto-language, namely Saami, Finnic, Mordvin, Mari, Permian, Hungarian, Mansi, Khanty, and Samoyed. In the binary tree, these basic branches
are further combined into a number of larger units: Finno-Saami, Volgaic (Mordvin and Mari), Finno-Volgaic, Finno-Permian, Ob-Ugrian (Mansi and Khanty), Ugrian
(Hungarian and Ob-Ugrian), and Finno-Ugrian, which, consequently, includes all basic branches except Samoyed.

Most general sources are content with an identical or a similar, nearly binary classification, the most common difference being the disintegration of the
Volgaic branch, which yields either a tripartite division of Finno-Volgaic into Finno-Saami, Mordvin, and Mari (e.g. Finnugor kalauz 1998: 254), or another
binary structure by adding Finno-Mordvin (e.g. Korhonen 1981: 27; cf. also the appendices in Napolskikh 1997). Janhunen (2000: 62) suggests a slightly
idiosyncratic version of the nearly binary tree, with Finno-Mordvin divided directly into Finnic, Saami, and Mordvin, and Hungarian-Mansi grouped together
instead of Ob-Ugrian.

On the other hand, it must be noted that many specialists have taken critical and even radically alternative standpoints. Hajdú (1979: 62) presents a
taxonomy of the Uralic languages which treats all basic branches except Ob-Ugrian separately (cf. also Hajdú & Domokos 1978: 137), while Lytkin in Osnovy (1974:
18) only recognizes Ob-Ugrian and, implicitly, Finno-Ugrian. Abondolo (1998a: 3) and Viitso (1997) propose binary classifications that contradict the
traditional one, but continue assuming Finno-Saami and Finno-Ugrian, Abondolo also, with caution, Ugrian (cf. Abondolo 1998b: 358; cf. also Viitso 2000:
156–159).

From the point of view of history of scholarship, the units under scrutiny here can be divided into two groups. Firstly there are those that have been
discussed extensively in the literature, namely Finno-Saami, Volgaic, Ob-Ugrian, and Ugrian. By contrast, the higher branches of the binary tree, Finno-Volgaic,
Finno-Permian, and Finno-Ugrian, have been described only superficially, at least until recent studies by Janhunen (1981) and Sammallahti (1988, 1998). In what
follows, we shall briefly take up each of these units.

1. Finno-Saami. Sammallahti (1999: 70) presents a list of eleven features which according to him may represent innovations confined to Saami and
Finnic, and which can therefore derive from Proto-Finno-Saami (cf. Sammallahti 1998: 122; cf. also Terho Itkonen 1997). Remarkably, Sammallahti (1999:
73–74) himself questions the Finno-Saami background of the six morphological markers in the list, so their indicative value cannot be regarded as
high.

Of the remaining five features, two are concerned with the lexicon. The first one correctly emphasizes the extent of common vocabulary shared by Saami and
Finnic. Here, as generally in the study of lexicon, the problem is how to distinguish between retentions and innovations, because it is possible that any word
has had a more extensive distribution in the past, and only internal reconstruction can occasionally shed light on the replacement of an original word with a
neologism. Clear cases of substituting a common Uralic word with a Finno-Saami one do not seem to exist though. Furthermore, a number of allegedly inherited
Finno-Saami words can belong to the layer of Finnic loan-words in Saami, or vice versa. Such words, lacking clear signs of either inherited or borrowed lexicon,
have usually been added to the common Finno-Saami layer, which is not methodologically sound and distorts the statistical picture to some extent (cf. Lehtiranta
1989: 8).

The second lexical feature involves shared loan-words. Since equally ancient loan-words appear in only one of the two branches, it remains possible that many
of the words in question have been borrowed parallelly into Saami and Finnic.

Turning to the last three, phonological features, Sammallahti (1999: 71) is the first to express doubts about the shared origin of consonant gradation in
Saami and Finnic, except on a general level of common preconditions. Notably, there are at least three competing hypotheses with regard to the emergenge of
gradation, so it cannot really serve as a taxonomic criterion.

We are therefore left with two sound changes, the development of labial vowels in non-initial syllables and the loss of initial labial glide in front of a
labial vowel. Without dwelling into the arguments and counterarguments by Terho Itkonen (1997: 237–239) and Sammallahti (1999: 72–73), it can be
maintained that these changes are not only marginal but they may have occurred in Saami and Finnic either independently or through secondary contacts.

Sammallahti (1998: 122) includes a pair of sound changes concerning the allegedly Proto-Finno-Saami merger of Proto-Uralic *x (in my view simply a voiced
velar fricative) with *k. He recognizes that no trace of *k is found in Finnic, but, curiously, instead of disregarding this change as evidence for Finno-Saami,
only adds that the change “may be later” than Proto-Finno-Saami (cf. Sammallahti 1998: 190).

It seems safe to conclude that the evidence for Finno-Saami as a branch deriving from a proto-language distinct from Proto-Uralic is far from convincing.
Nevertheless, Sammallahti (1999: 70) asserts that several structural and lexical features common to Saami and Finnic support the assumption of Proto-Finno-Saami
and that no valid structural counterarguments have been proposed. It is not immediately obvious what kind of counterarguments could in principle exist, but
hopefully, it is self-evident that the burden of proof lies on those who assume a historical entity rather than on those who do not. One way of testing
hypotheses such as Finno-Saami is to contrast them with potential subgroups not sanctioned by the standard binary classification, in this case notably a unit
consisting of Finnic and Mordvin but not Saami.

2. Volgaic. Mordvin is traditionally though nowadays less commonly classified with Mari in a branch called Volgaic. Bereczki (1974, 1988:
314–315) has presented detailed criticism of this hypothesis, so it can be left out from this discussion. What is noteworthy in our context is that thanks
to the criticism, Volgaic branch is now largely regarded as obsolete, and yet its basis is not much weaker than that of other units discussed here.

3. Ugrian and Ob-Ugrian. Subsuming Hungarian, Mansi, and Khanty under the Ugrian branch, and Mansi and Khanty as its Ob-Ugrian subbranch, is a
hypothesis that has been discussed more extensively in the literature than any other detail of the standard binary classification. Honti has published several
impressive and at the first sight conclusive lists of well over 20 features shared by Ugrian languages (Honti 1979: 7–19, 1998a: 353–355, 1998b:
179–181). Nevertheless, the most central arguments presented in favour of Proto-Ugrian as a language distinct from Proto-Uralic need to be contested. In
what follows, the phonological features presented by Honti are briefly discussed.

The system of sibilant consonants has changed in all Ugrian branches along similar lines, and it can be assumed that we are dealing with an ancient
development. It is, however, often ignored that the changes in the sibilant system are not restricted to Ugrian but also occur in Samoyed, to the extent that
Mansi and Samoyed show identical reflexes of Proto-Uralic sibilants. There is therefore a good reason to regard the specific development of the sibilants as an
areal feature characteristic of all eastern Uralic branches, that is, Hungarian, Mansi, Khanty, and Samoyed, rather than a specifically Ugrian innovation.

The velarization and fricativization of *k before back vowels is not shared by all Ugrian varieties, so Honti (1998a: 353) attributes it to Proto-Ugrian
allophony. It must be said categorically that there cannot be evidence for phonetic innovations in a proto-language if they are not somehow reflected in all of
its descendants. Notably, allophony of this sort is likely to be either universal or, if language-specific, irreversible.

The reflexes of intervocalic *ng as *ngk are not equally distributed among the Ugrian branches, and a sporadic development whose background remains
unexplained does not serve well the purpose of postulating a proto-language.

The idiosyncratic reflexes of the words for ‘eye’ and ‘heart’ are arguably among the best pieces of evidence for a close connection
between the Ugrian branches, but here again little is known about the processes that created the apparently irregular forms. The reflexes of reconstructed
clusters of a lateral or a fricative plus m are not uniform either in Ugrian or more widely in Uralic, which points to the need of further detailed
studies in Uralic historical phonology.

In the word for ‘three’ we do not really know whether forms with l or r are older. Janhunen (2000: 61) suggests that r might
be original and forms with l based on the analogy of numeral ‘four’. In any case, the Ugrian branches show different reflexes, so from the
point of view of Proto-Ugrian it is proof to the contrary.

The assumption of a velar, illabial, non-low vowel common to all of Ugrian does not prove anything if it cannot be shown that it emerged as an innovation,
and thanks to recent comparative studies we know that it is more likely to be a retention from Proto-Uralic (Sammallahti 1979: 57–59; Janhunen 1981:
227–228; cf. Abondolo 1996), or, if this is not the case, it is an innovation shared by Samoyed.

The position of a palatal labial vowel *ü in early Uralic remains unsettled, but whichever stand we take to this question, there are no signs of a Ugrian
innovation.

In other words, the first proposed innovation, the restructuring of the sibilant system, is not exclusive, because it involves not only Ugrian but also
Samoyed, while the second one, the velarization and fricativization of *k before back vowels, is not inclusive, because it does not cover all of Ugrian. The
latter argument is also valid for the change of Proto-Uralic *w to a voiced velar fricative, which Honti (1998a: 353, 1998b: 179) has recently included in the
list of Ugrian features on the basis of Khanty reflexes. The attempt to explain the reflexes elsewhere as being subsequently reverted may prove circular. The
other phonological features refer to sporadic developments or individual words which, while potentially indicative of the closeness of Ugrian branches, can be
seen as marginal arguments at best for Ugrian proto-language

There are as many as 14 morphological features discussed by Honti (1979: 9–12; cf. Honti 1998a: 354, 1998b: 180–181). It is not possible to go
into the details here, but it can be maintained that while these features do lend support to the idea of Ugrian unity in a broad sense, they remain indecisive
either because the age and background of the morphological markers in question are poorly known, or because they cannot be shown to be innovations, or because
there are parallel developments in other branches. For instance, forms of a noun meaning ‘side’ develop into postpositions and further into case
suffixes also in Samoyed. Also the opposition of subjective and objective conjugations is a prototypical feature of Samoyed, and the Ob-Ugrian, especially
Khanty, systems resemble the Samoyed system more closely than they resemble the Hungarian one. One of the characteristic features of Ob-Ugrian is the passive,
but the passive suffixes are different in Mansi and Khanty, a situation strongly suggesting an areal rather than a genetic connection, while the Khanty suffix
seems identical with the Samoyed reflexive suffix, whose functions are reasonably close to the passive.

In favour of the Ob-Ugrian hypothesis, Honti (1998a: 352–353; cf. Honti 1998b: 183–184) has presented no phonological but only five morphological
arguments. They require reservations similar to those presented above.

Honti (1979: 12–19) presents further six arguments for Proto-Ugrian concerning vocabulary (cf. Honti 1998b: 178–179). The problem here again is
that the mere presence of a word in a language tells us little of possible lexical innovations. Conclusive evidence based on lexical material can only be
obtained by studying the loss and replacement of words. Within the long lists of Ugrian words, there may in fact be several cases suggesting a true innovation,
for example the word for ‘fire’, but mostly it is impossible to say if they represent formerly more wide-spread words or perhaps parallel borrowings
from a common source.

It would be utterly foolish to insist that there were no features suggesting close links between Hungarian, Mansi, and Khanty. On the contrary, especially
within lexicon the Ugrian languages differ from the rest of the language family to the extent that Gulya (1994) has described Ugrian as a ‘hole’ in
the family tree. While it is true that many Uralic words have been replaced by neologisms in Hungarian, Mansi, and Khanty, this has mostly occurred separately
in each of the three branches, and, of course, many old words have been lost in other Uralic branches as well. The lexical differences between Mansi and Khanty
are also notable, as Kálmán (1988: 400) has pointed out.

Since there is little evidence for Proto-Ugrian as a genuine genetic unit, this is a good place to resort to the concept of ‘areal genetic unit’,
which was launched by Helimski (1982) with reference to Finno-Volgaic and Finno-Permian, but which in my view suits equally well Ugrian and Ob-Ugrian, as well
as Finno-Saami and Volgaic. Areal genetic units differ crucially from strictly genetic units, that is, branches deriving from a distinct proto-language, in that
they can overlap with each other. We are therefore free to recognize areal genetic units that contradict with the standard classification, for instance,
Finno-Mordvin, Mari-Permian, Permian-Hungarian, Hungarian-Mansi, and Khanty-Samoyed, as well.

4. Finno-Permian and Finno-Volgaic. The highest-level branches in the binary classification, Finno-Volgaic, Finno-Permian, and Finno-Ugrian, have not
been subject to much debate. In Osnovy (1974) we find tentative lists of alleged Finno-Permian and Finno-Volgaic innovations in a section written by Rédei
(Osnovy 1974: 50–52). He suggests that fricativization or voicing of stops in internal positions can be seen as a Finno-Permian innovation, but since this
process is only found in Permian, Mari, and Mordvin, and has occurred differently in these branches, it cannot be considered as such. The history of the palatal
labial vowel *ü remains an open question so it cannot be successfully utilized in this context. By contrast, the emergence of Finnic long vowels is one of the
questions that have been solved in Uralic historical phonology since the publication of Osnovy (1974), although this is not recognized in many reference works
such as the Uralic etymological dictionary (Rédei 1986–1991). The long vowels in question derive from Proto-Uralic sequences of a vowel and *x (Janhunen
1981: 239–243), and this development has probably nothing to do with the Finno-Permian hypothesis.

Rédei (Osnovy 1974: 52) asserts that there are no known phonological innovations in Finno-Volgaic, and the few morphological features assigned to
Finno-Permian and Finno-Volgaic are not necessarily innovations either.

More recently, Sammallahti (1998: 120–122) has identified two Finno-Permian and one Finno-Volgaic sound changes. The remarks presented below in
connection with Finno-Ugrian hold good for them as well. It is, hopefully, needless to emphasize that one or two innovations, even if they were based on firm
evidence, do not imply a division of the proto-language, in other words, an isogloss does not necessarily correspond to a language boundary (Salminen 2001).

5. Finno-Ugrian. The cornerstone of the traditional binary classification of the Uralic family, the Finno-Ugrian branch, relies heavily on lexical
evidence. The point is that while there are a lot of words common to Uralic branches other than Samoyed, there is no way of knowing that such words have not
been common Uralic words which have simply been lost and replaced with other words in Samoyed unless there is external evidence (cf. Janhunen 2000: 60). Since
it would be difficult to reckon with a wave of lexical changes but few if any phonological or grammatical innovations in Proto-Finno-Ugrian, it can be assumed
that Proto-Samoyed was, indeed, lexically very innovative. According to our present knowledge (Koivulehto 1991; Rédei 1986, 1988), the distribution of
Indo-European loan-words does not lend support to the idea of Finno-Ugrian proto-language either.

In morphology and syntax, there seem to be no observed differences between Uralic and Finno-Ugrian (Salminen 1997). The occasional discrepancies between
Finno-Ugrian and Uralic reconstructions of the same etyma are largely due to different approaches to Uralic historical phonology.

Janhunen (1981) and Sammallahti (1988: 486, 490; 1998: 119) have, however, suggested that there might be a small number of phonological innovations on the
Finno-Ugrian level, but the evidence is narrow, unsystematic, and open to various interpretations (Salminen 2001). These authors notably take the binary scheme
for granted, and disregard the possibility of finding equally well attested innovations incompatible with the scheme. Finno-Ugrian can therefore be best
regarded as another areal genetic unit, possibly coexisting with other, overlapping units such as, for instance, Ugro-Samoyed.

6. To sum up the phonological and other evidence for the alleged proto-languages between Proto-Uralic and the level of the basic branches, it can be
stated that there is very little of it. Indeed, by comparing material from any two of the nine basic branches, including pairs such as Saami and Finnic, or even
just Mansi and Khanty, we reach a level of reconstruction that is very close if not essentially identical to Proto-Uralic.

It is true that binary classification has acquired the status of received wisdom in Uralic studies, which makes many specialists reluctant to criticize, let
alone abandon it, but insofar as the sole merit of tradition is its traditionality, there appears little need and no justification to keep to it. To be on the
safe side, it must be repeated that such a change in the common practice of classifying Uralic languages does not imply that the validity of the Uralic language
family as a genetic unit, or the application of the tree model to its taxonomic description, should also be questioned.

More importantly, recognizing the weaknesses in the standard classification opens new horizons for specialists working on comparative Uralic linguistics.
This is also how I understand Häkkinen’s suggestion for starting the research from zero (Häkkinen 1983: 82), a statement directed at the problematic
issues in Uralic historical phonology and morphology that may have received biased treatment in the past, but interpreted, in my view mistakenly, by Esa Itkonen
(1999: 89) as an attempt to cast away the achievements of the previous generations of linguists.

It should go without saying that the scholars who have defended the hypotheses concerning branches such as Finno-Saami and Ugrian, notably Sammallahti and
Honti, have gathered and analysed an impressive amount of data, and contributed to the study of the taxonomy of the Uralic languages in a manner that will prove
vital for any future undertaking in the field. Both their work and its critique should highlight rather than undermine the importance of these complex and
intriguing questions.

Nevertheless, there should be no doubt that as for the extent and quality of the factual evidence, the units discussed above constrast sharply with the basic
branches, so sharply that their inclusion in the standard classification distorts the picture of the interrelationships of Uralic languages. For example, in
most versions of the binary tree the relationship between Khanty and Mansi is given as equal to that between Komi and Udmurt, while it should be obvious that
the Ob-Ugrian languages are not nearly as close as the Permian languages. More drastically, the separation of Samoyed from the rest of Uralic, so often taken
for granted, has lead Finno-Ugrian and Samoyed to be treated on a par with language families such as Turkic, Mongolic, and Tungusic (cf. Sinor 1988:
738–739), not to speak of more recent misunderstandings of the nature of Uralic affinities.

References

Abondolo, Daniel 1996. Vowel rotation in Uralic: Obug[r]ocentric evidence. SSEES Occasional Papers 31; London: School of Slavonic and East European
Studies, University of London.

Anttila, Raimo 1989. Historical and comparative linguistics. Second revised edition. Amsterdam studies in the theory and history of linguistic
science, Series 4: Current issues in linguistic theory 6; Amsterdam & Philadelphia: John Benjamins.