Evolution of Human Languages

There are currently about 6000 languages on our planet as of
2014, some of them spoken by millions and some by only a few dozen people. A
primary goal of EHL researchers is to provide a detailed classification of these
languages, organizing them into a genealogical tree similar to the accepted
classification of biological species. Since all representatives of the
species Homo sapiens presumably share a common origin, it would be natural to
suppose - although this is a goal yet to be achieved - that all human
languages also go back to some common source. Most existing classifications,
however, do not go beyond some 300-400-language families that are relatively
easy to discern. This restriction has natural reasons: languages must have
been spoken and constantly evolving for at least 40,000 years (and quite
probably more), while any two languages separated from a common source
inevitably lose almost all superficially common features after some
6,000-7,000 years.

Nevertheless, despite widespread
skepticism and reluctance to tackle the problem, there are a number of
scholars who believe that these obstacles are not insurmountable. Research
has been going on over the past several decades that appear to indicate that
larger genetic groupings are not only possible, but indeed quite plausible.
It can be shown that most of the world's language families can be classified
into roughly a dozen large groupings, or macro families. Two sorts of
evidence can be used for this purpose:

1) Even a superficial analysis of
the vocabulary of a large number of linguistic families reveals numerous
lexical similarities extending far beyond the borders of the smaller genetic
units. They are frequently restricted to individual macro families (such as
Eurasiatic, Afro Asiatic etc.), but a significant number of such matches have
already been found between the macro families themselves, pointing to the
probability of common origin

.

2) Classical historical
linguistics has developed a very powerful tool - the comparative method -
that allows the reconstruction of unattested language stages, so-called
proto-languages. It turns out that whereas modern languages may vary
significantly, protolanguages in various cases tend to be much more similar
to one other. This is the case, e.g., with Indo-European, Uralic and Altaic:
modern English, Finnish, and Turkish may have almost nothing in common, but
their respective ancestors - Proto-Indo-European, Proto-Uralic and
Proto-Altaic - appear to have many more common traits and common vocabulary.
This means that the possibility exists of extending the time perspective and
reconstructing even earlier stages of human language and much of this
research has already been conducted.

.The amount of information
that has to be processed in order to achieve a deep linguistic taxonomy is
enormous - if one keeps in mind that one has to process thousands of
languages and hundreds of linguistic families. Modern computer technology,
however, provides some solutions to these problems. The first step that needs
to be taken is a compilation of computer databases containing established
matches between related languages - etymologies. The primary goal of the EHL
research is therefore to collect and compile such databases and to make them
easily available: in the present world this means making them available on
the Web. A large set of computer databases is already available and many of
them are already online. The databases provided by the EHL participants, and
freely browseable on the Web, include Altaic, Dravidian, (North) Caucasian,
Yenisseian, Sino-Tibetan, Indo-European, Austroasiatic, Chukchi-Kamchatkan,
and Semitic. For many other language families the databases are in the stage
of preparation.

Etymological databases for several
macro families are also being compiled, and several of them - Australian,
Eurasiatic (Nostratic) and Afro Asiatic - are already near completion. Once
an etymological database becomes available, it can be used to significantly
simplify the task of searching for lexical cognates and building up higher
level databases. Etymological databases can also be used (and are being used)
for a statistical evaluation of taxonomic correlations. The number of
etymological matches between languages is a good measure of the distance
between them and they can also be employed for evaluating the time depth of
any linguistic family. In fact, so-called lexicostatistics is the only
available tool for absolute linguistic dating and its theoretical rationale
and practical employment is one of the central tasks of the EHL project.

While the project is concentrated
on building up a hierarchical system of etymological databases, reflecting
the hierarchical taxonomy of the linguistic genealogical tree, it is also concerned
with collecting and putting online primary language wordlists as well as
existing etymological sources. The ideal etymological database system should
be able to provide an etymology for any word in any modern or ancient
language, tracing its origin as far as possible. The participants of the
project have provided source wordlists for poorly explored language families
such as Indo-Pacific and Australian, where most of the comparative work is
yet to be done. They have also scanned, recognized, and converted to database
format some of the major existing etymological dictionaries, such as
Pokorny's Indo-European etymological dictionary.

The ultimate goal of the system of databases described
above is to arrive at a stage when an absolute majority of the world's
languages can be reduced to a minimum number of huge language macro families,
which in turn can be traced back to a Proto-Sapiens stage, should the
databases provide sufficient evidence to support the hypothesis of
monogenesis. With the database system completed, and the basics of the
Proto-Sapiens structure established, we can hope to come into possession of a
vital tool for helping us understand the nature of the origin of language
itself.