They don't stand up to scrutiny at all. His work is well outside of mainstream acceptance, and his methodology is considered dubious at best.

Virtually all serious work in historical linguistics takes it as axiomatic that there's a hard horizon beyond which we won't be able to reconstruct proto-languages reliably. How far back that goes depends on a number of factors, of course, but it's hard to imagine getting much before farther back than 6-9k years ago even in the best of circumstances. Ruhlen thinks he's constructing forms nearly an order of magnitude older than that.

Virtually all serious work in historical linguistics takes it as axiomatic that there's a hard horizon beyond which we won't be able to reconstruct proto-languages reliably. How far back that goes depends on a number of factors, of course, but it's hard to imagine getting much before farther back than 6-9k years ago even in the best of circumstances. Ruhlen thinks he's constructing forms nearly an order of magnitude older than that.

As for his work in historical linguistics, I looked around a bit and I think it was a critique by Lyle Campbell (no clue about his merits) which gave a concise and perceptive statement that analysing the level of change between the hypothetical PIE and its descendants, 5,000-10,000 years you statistically cannot separate any possible valid relations from the noise — making comparative linguistics impossible past a certain point, the event horizon if you want to be fancy

Some of that stuff is pretty silly, though it is interesting to hypothetically explore.

As for Africa, his classification is more or less accepted broadly. That doesn't mean there are not exceptions, but at least in general the major four subgroupings (plus Austronesian in Madagascar) are the standard starting point. This is not true, for example, for his classification in the Americas which appears to be wrong, though I still think it's an interesting idea to line up the linguistic distribution with migrations.

Overall there's a question of what it would mean to be "right" or "wrong" in these cases-- I doubt Greenberg or Ruhlen actually thinks they're strictly speaking correct about any of the etymologies, but rather that they're narrowing down the possibilities and making good guesses. Taking it in that context, the work is less crazy, but there still are some fairly obvious limitations that they seem to be ignoring.

They don't stand up to scrutiny at all. His work is well outside of mainstream acceptance, and his methodology is considered dubious at best.

So is there any broader classification of African languages at all? Is the stuff I read at wikipedia reliable? Or are African languages still a vast unexplorer wilderness?

The bulk of African languages are reasonably well classified into three or four families. The weakest evidence is probably what groups Nilo-Saharan, but even that's leagues ahead of these Proto-Sapiens reconstructions.

Like you suggest, some "isolates" are only isolates because we haven't yet done the reconstruction work yet (i.e., a lot of the Amazon), but then there are a great many that will probably never be reliably linked because the time depth leaves us drowning in noise.

Virtually all serious work in historical linguistics takes it as axiomatic that there's a hard horizon beyond which we won't be able to reconstruct proto-languages reliably. How far back that goes depends on a number of factors, of course, but it's hard to imagine getting much before farther back than 6-9k years ago even in the best of circumstances. Ruhlen thinks he's constructing forms nearly an order of magnitude older than that.

As for his work in historical linguistics, I looked around a bit and I think it was a critique by Lyle Campbell (no clue about his merits) which gave a concise and perceptive statement that analysing the level of change between the hypothetical PIE and its descendants, 5,000-10,000 years you statistically cannot separate any possible valid relations from the noise — making comparative linguistics impossible past a certain point, the event horizon if you want to be fancy

I'm bothered by the idea of a strict limit. Instead, I think it's exponentially more difficult earlier in time. So maybe 12,000 works sometimes. If really pushed, maybe 15,000. But 100,000 is almost certainly ridiculous.[Edit: I also think methods for looking earlier should not be eyeballing dictionaries. They should involve systematic reconstruction and use of the earliest records then comparing those to each other. It will never go anywhere to compare what languages look like today to what they might have looked like a long time ago, without serious/rigorous backtracking. It is still likely to fail at a certain time depth, but if anything that's the right method.]

Also, I meant to add above that the Khoisan group has been questioned recently and I have a few references to that effect if you're interested:

A remote relationship of Indo-European to the Uralic languages is possible. Geographically, the earliest reconstructing locations of the two families are contiguous.

On the whole, however, the lexical resemblances between Indo-European and Uralic are very sparse; the two families, if they are related at all, must have separated thousands of years before the breakup of Proto-Indo-European.

If Indo-European is related to other language-families—e.g., to Afro-Asiatic (which includes the Semitic languages) or to Kartvelian (which includes Georgian)—it must have diverged from them much earlier than it diverged from Uralic, because the number of cogent resemblances is still smaller.

It's a bit of an oversimplification but what we are looking at is essentially tn ... the difficulty isn't exactly linear, and trying to reconstruct farther back using proto-languages makes it even tougher — if one can even consider it valid practice to begin with.

(EDIT)

As djr mentioned, exponential — but at some point you still reach a situation where the SNR !>1, and you can now stare yourself blind at the event horizon without making any further progress.

It's all statistical. As the time depth increases, so does the margin of error. So while the available data may suggest a closer relationship with Uralic than other groups, the margin of error also increases with that time depth, so in statistical terms there may be no reason at all to assume any relevance there. Likewise, there is no true "event horizon" because we can always make some kind of guess-- but with ever increasing margins of error, we end up at a point where there is almost no statistically relevant information. In other words, we can continue to make hypotheses but with no way to test them. In that sense, I guess there is an "event horizon" in that the tests don't even hint one way or the other in a statistical sense.Remember even with Indo-European the margin of error is far greater than 0. So there is no point before which we can know things for certain and point after which we can't, but rather just a reasonable time depth where our guesses don't seem too crazy.

That's what I meant by "event horizon" and "SNR not geater than 1" ... any valid relation is drowned out in noise, so even if you could make hypotheses and guestimations, it's not falsifiable; even if SNR was far less than 1 there could still be valid data in there, there's just no way of getting it out.

This may or may not correspond to any particular data set or time scale, but the statistical event horizon remains.

One problem is that the SNR is not known. So the probable SNR increases with time depth, but we have no way to know exactly how. Overall, yes, what you said. But we don't ever know when it's too far.

Not true. Definitionally, you never know whether a particular piece of data is signal or noise. That's why it's noise. But, any serious examination of language relatedness (though, natch, not Ruhlen's) will quantify the degree of entropy in the system, and thus we absolutely can know when we've gone back too far.

Icelandic hasn't changed much in the past 1000 years. English has changed immensely. The amount of noise entirely depends on the specific language(s) in question.

A normal problem involving noise, as you said, does not allow us to separate the real data from the noise, so we consider the SNR. But in this case, the SNR itself is unknown because the amount of noise is dependent on a number of unknown (unknowable?) factors.

Quote

But, any serious examination of language relatedness (though, natch, not Ruhlen's) will quantify the degree of entropy in the system, and thus we absolutely can know when we've gone back too far.

Can you explain this a bit? In historical linguistics everything is a guess (some better than others, of course). So how can we absolutely know anything?

Icelandic hasn't changed much in the past 1000 years. English has changed immensely. The amount of noise entirely depends on the specific language(s) in question.

A normal problem involving noise, as you said, does not allow us to separate the real data from the noise, so we consider the SNR. But in this case, the SNR itself is unknown because the amount of noise is dependent on a number of unknown (unknowable?) factors.

I'm not really sure what you're understanding as "noise" here. In a standard cross-linguistic corpus analysis, the challenge is simply to separate similarities due to relatedness from similarities due to chance, as conditioned by phonological patterns. There are many ways to go about this, but most techniques involve predicting a baseline of similarity expected between unrelated languages and then measuring forward against that standard. With this information, can you know the relatedness of two particular lexical items? Usually not very well. Can you know the relatedness of two large lexical sets? Yes, to a high degree of probability.

But, any serious examination of language relatedness (though, natch, not Ruhlen's) will quantify the degree of entropy in the system, and thus we absolutely can know when we've gone back too far.

Can you explain this a bit? In historical linguistics everything is a guess (some better than others, of course). So how can we absolutely know anything?

If your standard is absolute knowledge (whatever that means!), you'll have to find a new field. In science, everything we know is subject to revision in the face of newer and better evidence. That caveat comes pre-baked into how scientific epistemologies use the word "know", and there's nothing particularly unusual about historical linguistics in this regard.