The answer to this question is almost certainly yes. We don’t even have to assert that this happened deliberately; it could just be that by stumbling upon these difficult-to-pronounce sounds, ancient tribal groups came to recognize that they could profitably identify “insiders” and “outsiders.” One of the best examples of this is the “click” sound of the San Bushmen’s and the Hadza languages of Africa. If you don’t learn to make this sound in your youth, you can never properly make it, and so this sound, among other uses for it, can reliably identify outsiders.

The ability to identify outsiders has been important throughout our history because our tribal groups have been of enormous value to us in promoting our survival and prosperity. They have also been in an almost continual state of conflict with other groups competing over the same lands and resources. As a result, tribal groups developed a variety of ways of promoting tribal cohesion and identity, and language is perhaps the most salient of these. The significance of difficult-to-pronounce sounds is that they genuinely mark out someone who can make them as someone with whom you share a long cultural history. So, in this sense, they cannot be “faked,” and this is why they become trustworthy or reliable “badges” of identity.

Mark Pagel is a professor of evolutionary biology at the University of Reading.

While I’m skeptical that the language would retain difficult sounds to make things harder for outsiders I do believe that certain sounds may be judged more difficult than others.

For example I remember reading somewhere that it’s not until around the age of six that Swedish children learn to distinguish between the tje, /ɕ/, and the sje, /ɧ/ or /ʂ/, sounds. The later is also considered to be hard to pronounce and whole dissertations have been written about it.

Someone with even the most basic rudiments of linguistic understanding would realize that this idea is silly: The number of phonemes in languages has tended to decrease, not increase, over historical time as humans have settled new parts of the world.
If the author had bothered to take Linguistics 101 and look at some actual evidence, he probably would have come to the exact opposite conclusion.

This is misleading (and, frankly, wrong) on multiple levels. First, the case you mention (click sounds): any reader here can make click sounds (go “tsk tsk”), and they are quite easily borrowed from languages to language: the Nguni languages (i.e. Zulu, Xhosa) of southern Africa, in particular, are known to have numerous borrowed words from the Khoe-San languages, picked up when speakers of Nguni languages migrated into territory already populated by Khoe-San speakers. There was no tight social integration of the two groups — just commerce of the sort you’d expect from herders living next to farmers — but the sounds clearly jumped from language to language.

Secondly, sounds that can be used to distinguish in-group members from out-group members do exist, but these are NOT chosen by some metric of uncommonness in the world’s languages as a whole; they’re chosen based on what plainly marks one as an in-group member, per the languages involved. Linguists will sometimes call these “shibboleths.” The case that lent us the word “shibboleth” is a good example of what I’m trying to get at here. Supposedly, one Biblical group could use the Hebrew word “shibboleth” to tell if someone was a member of another group based on if they said “shibboleth” or “sibboleth.” The second group’s language lacked a “sh” sound, and so saying “sibboleth” was a dead give-away. This judgment is rendered NO less effective by the fact that “s” and “sh” sounds are both very common in the world’s languages; in fact, they’d be no more or less useful than clicks if both were available for use as shibboleths.

Finally, sounds very clearly do not become harder to pronounce over time through some sort of optimizing function of social distinctiveness. Given the author’s background, this function is probably supposed to be evolutionary: sounds that work better as social distinguishers end up being used more. That, if that is the argument, is absolutely not the case: sound change and the development of new sounds in spoken languages is largely dictated by constraints inherent to the spoken medium — the shape of the vocal tract, the limits of auditory perception, and so on. “Difficult” sounds have been shown to arise by these physically-grounded mechanisms alone; positing that more socially distinctive sounds end up working better (and thus are used more) for languages is almost always unnecessary.

I think you’re falling prey to a basic fallacy when you attack the validity of what this article claims on the basis of an overall statistical trend (which may or may not be a robust one; I have not interested myself in such matters).

It remains that an overall average trend may say very little about the more complex range of data that it represents in a grossly simplified manner.

There are endless examples from the past two millennia of languages developing greater phonological and phonetic complexity as the result of phonetic change and interaction with morphology and even syntax, and concomitant reshaping of phonological systems. I can think of a whole set of such examples off the top of my head, from the development of complex tonal systems in east and southeast Asia, to contrasting series of palatal fricatives and affricates in Slavic languages, to the development of complex articulations in African languages, and the evolution of individual highly marked phonemes in various languages. The existence of an apparent overall trend of simplification is only a blunt reflection of the overall variability in the data. I wonder if it might in fact be more an artifact of which languages have spread, i.e. a particular geolinguistic factor favouring an overall appearance of simplification, than an indicator of any robust trend at a more basic level.