Scholars have tried to model this as
concerted evolution (Hruschka et al. 2015). But the
analogy with biology does not sound very convincing, as the change concerns the
production of speech rather than its product. By this, I mean that sound change
concerns the abstract system by which speakers produce the
words of their
language. Think of speakers in comic books who lose a tooth in some
fight.
Often, in order to show how their speech suffers from this loss, writers
illustrate this by replacing certain "s" sounds in the speech of the
victims
with a "th" (in German, it would be an "f").
They do this in order to illustrate that with a lost tooth, it is
"very difficult to thpeak". In the same way, writers imitate speech of
people suffering from speech impediments like sigmatism (lisp). The loss
of a tooth changes all "s"es in a
person's language. Sound change, at least one type of sound change, is
identical with this.

In a recent
talk
I gave with Nathan Hill at a conference in Poznań, we found a way to demonstrate this on actual language data. In this talk, we used data from eight
Burmish languages (a language family spoken mainly in the South-West of China
and in Myanmar), which we coded for partial cognates (as these languages contain
many compounds). We aligned these cognate sets automatically, and then
searched for recurring patterns in the alignments. One needs to keep in mind
that our words in linguistics are extremely short, and we have no more than five sounds per alignment in our data, which translates to five sites in an
alignment in biology.

While biology knows certain contextual patterns like
hydrophilic stretches in alignments (as already demonstrated in the famous
ClustalW software, compare Thompson et al. 1994), the
context in which a sound occurs in language evolution is even more important.
We can, for example, say, that the beginning of a word or morpheme is usually
the most stable part, where sounds change much more slowly than in the other
parts (in the end of a word or of a syllable). We thus concentrated only on the
first sound of each word and looked at the patterns of sounds we could find there.

Those patterns in our data usually look like this:

Cognate set

L1

L2

L3

L4

L5

L6

L7

L8

word 1

p

p

p

Ø

f

f

Ø

p

word 2

p

Ø

p

p

Ø

f

p

p

word 3

k

Ø

tɕ

k

s

k

Ø

k

word 4

Ø

k

tɕ

Ø

s

Ø

s

k

...

...

...

...

...

...

...

...

...

Note that the symbol "Ø" in this context denotes missing data, as we
did not find a cognate set in the given language. As always, most of our
data is patchy, and we have to deal with that.
You can see that when looking only at the first sound in each alignment,
we find quite a degree of variation; and if you look at all the data, you
can see some things that seem to structure, but the amount of complexity is
still immense. You may see this from the following plot, showing only some 100 of
the more than 300 patterns we created (coloured cells represent not necessarily the same sound, but one of ten different sound classes to which the more than 50 different sounds in our data belong):

Sound patterns (initial consonant) in the aligned cognates sets of the Burmish languages

Interestingly, however, most of the variation can be reduced quite efficiently with help of network techniques.
Since we are dealing with systemic evolution, it is straightforward to group our more than 300 alignments
into groups that evolve in an identical manner. At least this is what our
linguistic theory predicts, and what linguists have been studying for the last 200 years.
When looking at the patterns I gave above, you can see that we can easily group the four sounds into two groups:

Cognate set

L1

L2

L3

L4

L5

L6

L7

L8

word 1

p

p

p

Ø

f

f

Ø

p

word 2

p

Ø

p

p

Ø

f

p

p

-

-

-

-

-

-

-

-

-

word 3

k

Ø

tɕ

k

s

k

Ø

k

word 4

Ø

k

tɕ

Ø

s

Ø

s

k

Essentially, the two groups reflect only two patterns, if we disregard the gaps and merge them into one row each:

Cognate set

L1

L2

L3

L4

L5

L6

L7

L8

word 1 / word 2

p

p

p

p

f

f

p

p

-

-

-

-

-

-

-

-

-

word 3 / word 4

k

k

tɕ

k

s

k

s

k

What is important when grouping two alignments into one pattern is to make sure that they do not
contain any conflicting positions. This can be checked in a rather straightforward manner by constructing a network from the data. In this
network, the nodes are the alignment sites (word 1, word 2, etc. in our
examples), and links are drawn between nodes if two sites are not in conflict
with each other. If we use this criterion of compatibility on our data, we
receive following network:

Compatibility network of the sites in our aligned cognate sets

In the network, I further coloured the nodes according to the overall
similarity of sounds present in them. The legend gives capital letters
for major sound classes, in order to facilitate seeing the structure.

This network itself, however, does not tell us how to group the data into
classes that correspond to one identical process of systemic evolution, as we
can still see many conflicts. In order to solve this, we need to carry out a
specific partitioning analysis that cuts the network into an ideally minimal
number of cliques. Why cliques? Because a clique will represent patterns in our
data that do not show any conflicts in their sounds, and this is exactly
what we want to see: those patterns that behave identically, without
exceptions.

The problem of finding the minimal clique partition of a network
is, unfortunately, a hard one (see Bhasker and Samad 1991), so we needed to use some approximate shortcuts. Nevertheless, with a very
simple procedure of clique partitioning, we succeeded at reducing the 317
cognate sets that we selected for our study down to 35 groups that covered 74% of
the data (234 cognate set), with a minimal size of 2 alignments per group.
The "manual" inspection by the Burmish expert in our team (that is Nathan Hill)
showed that many of these patterns correspond to what experts assume was one single
sound in the ancestral Proto-Burmish language.

But to just illustrate more closely what I mean by reducing patterns
to unique groups, look at the following pattern, which shows different
nasal sounds in the data:

Nasal sounds in the Burmish data

And then at another pattern, showing s-sounds:

S-sounds in the Burmish data

I think (at least I hope) that the amount of regularity we find here is enough to demonstrate what is meant by the regularity of sound change
in linguistics: sound change is in some sense just like losing a
tooth, but for a complete population of speakers, not just one speaker,
as the population starts to change all sounds occurring in a certain environment to some other sound.

Our results are not perfect: the
26% of unique patterns, for example, are something we will need to look into in
more detail in the near future. A quick check showed that they may result from
errors in the cognate annotation, but also from peculiarities in the data, and
even simply from sounds that are rare in the languages under investigation.

We are currently looking into these issues, trying to refine our approach. I
realized, for example, that the minimal clique coverage problem has been
studied before by other researchers, and I found a rather large amount of
Russian literature on the topic (see, for example, Bratceva and Čerenin
1994 and Ryzhkov 1975), but those
approaches do not seem to have been thoroughly studied in the Western
literature. We also know that at some point we need to relax our approach,
allowing for some exceptions — we know that systemic sound change processes
are easily overridden by language-specific factors, be it lateral transfer, or
pragmatics in a larger sense (think of Bob Dylan, talking of "the words I never
KNOWED" in order to make sure the word rhymes with "ROAD", or the form "wanna"
as a shortcut for "want to").

Not all cases in which speakers changed the pronunciation of sounds have systemic reasons, and we are still far from actually understanding the
systemic reasons that lead to the regular aspects of sound change. What we can
show, however, is that sound change is really something peculiar in language
evolution, with no real counterpart in biology. At least, I do not know of
any case where a set of 300 alignments could be reduced to some 35 largely
identical patterns. This shows, on the other hand, that the classical
biological approaches that try to model each site of an alignment
independently are definitely not what we need in order to model sound change
realistically. The assumption of independence of sites in an alignment is
already problematic in biology. In linguistics, at least in the cases
illustrated above, it seems to be just as useless as tossing a coin to predict
the weather in a desert: it is too much of an effort with very poor results to be expected.

Tuesday, October 18, 2016

In an earlier blog post, I noted that The Music Genome Project is no such thing. The use of the word "genome" in this context is an analogy, in which the musical characteristics are seen as producing a sort of genetic fingerprint. However, this is a false analogy, because the data used for the Music Genome Project are actually phenotypic, not genotypic. Indeed, music has no analog of a genotype.

In a similar vein, the data used for The Genome Cellar are phenotypic, not genotypic, and so this is also a false analogy.

The Genome Cellar is the database used by the Next Glass app. This app was released in November 2014, and a concurrent press release explained the concept:

Next Glass is the breakthrough app that uses science and machine learning software to provide accurate, personalized recommendations to consumers. Next Glass has analyzed tens of thousands of bottles of wine and beer with a mass spectrometer and stores the "DNA" of each product in its Genome Cellar™, which combines with users' Taste Profiles™ to provide product-specific recommendations.

So, the beer / wine data in the Genome Cellar are peaks in a spectrophotometer output. This is made clear in another press release:

Next Glass has developed the world’s first Genome Cellar, an extensive database that contains the chemical makeup – or "DNA" – of tens of thousands of wines and beers. By looking at each bottle on a molecular level, Next Glass defines a unique taste profile for every bottle by analyzing thousands of chemical elements.

This procedure will, indeed, provide a unique fingerprint for each alcoholic product, but it will be a phenotypic one not a genotypic one. Genetics is often chemistry but not all chemistry is genetics.

The idea of the Next Glass app is the same as that for the Music Genome Project — to use the fingerprint of currently liked products (music or wines / beers) to make recommendations for other products that might appeal to the customer. This approach can be expected to work for alcoholic beverages, because the subjective preferences will be based to some extent on the sensory components of the chemical makeup. If you document enough of the chemistry then you are bound to include a large proportion of the sensory part.

Finally, you might like to compare this approach with that of WineFriend, which tries to assess your taste in wine with multiple-choice questions, instead of complex chemistry. WineFriend:

uses a simple eight question taste survey that gives insights into a customer's thresholds for sweet, sour, bitterness and intensity of flavour. It then creates a profile which enables it to select wines that are tailored to the individual customer's tastes.

Tuesday, October 11, 2016

It has long been known that ideas about female attractiveness, and concern with body weight among young women, are closely related to exposure to mass media images (see the review by Spettigue & Henderson 2004). The print media are particularly involved in this issue, not least the so-called "men's magazines", such as Playboy. It therefore created a great deal of media interest when it was announced in October 2015 that Playboy would no longer feature nude centerfolds (known as Playmates).

Indeed, Playboy has often been claimed as a purveyor of the US society's image of the "ideal woman", although this is surely media exaggeration. Playboy, whether we love it or hate it, has simply portrayed females that the editors thought would sell magazines at the time. Nevertheless, the magazine's choice of models has been used in the professional medical and psychological literature as representative of a prevalent cultural idealization of an ultra-slender female body shape (eg. Garner et al. 1980; Wiseman et al. 1992; Szabo 1996; Spitzer et al. 1999; Katzmarzyk & Davis 2001; Pettijohn & Jungeberg 2004).

It therefore comes as no surprise that the magazine's database of model statistics was subjected to scrutiny in the online media after the 2015 announcement, particularly with regard to how things had changed during the magazine's 62 years (for an earlier analysis, see The girls next door: Life in the centerfold). Sadly, some of this recent analysis was quite poor (eg. Playboy's image of the ideal woman sure has changed). Here, I try to correct this by presenting a more thorough study of the available data.

The data I have used covers all of the Playmates of the Month that have appeared in the US edition of the magazine since its inception. This is contained in a searchable version of the pmstats.txt file that has been maintained by Jim Dean, Johnny Corvin and Doug Ewell, as currently available on Peggy Wilkins' website. This file is an updated compilation of the so-called "vital statistics" of the Playmates from December 1953 to February 2016, inclusive, as reported in Playboy, sometimes supplemented from other available sources.

Note, especially, that the data are basically self-reported by the Playmates. Some of the information has been questioned at various times, notably where it seems to contradict the associated photographic evidence. As a reputable scientist, I should probably have personally checked all of this evidence, but I have not done so (you can do so yourself, based on whatever photos you can find on the internet, or the book edited by Gretchen Edgren 2006). I have simply assumed that, at a minimum, the information presents whatever the Playmates thought was a desirable public image at the time of publication.

There are 753 records in the dataset, separately including twins and triplets appearing in the same magazine issue, as well as multiple appearances by the same woman in different issues. The data include: magazine issue month; Playmate name, birth date and birth location; height in inches and weight in pounds; breast, waist and hip dimensions in inches; and photographer name. From this information, for each Playmate I calculated their age at the time of publication, along with standard measurements for determining whether a body is healthy or not: Body Mass Index (BMI), for body size (ie. underweight, normal weight, overweight, obese), and Waist to Hip Ratio (WHR), for body curvaceousness.

Analysis

As is usual in this blog, the data can be summarized using a phylogenetic network as a form of exploratory data analysis (see How to interpret splits graphs).

I first range-standardized the data (so that all of the measurements are compared on the same scale), and log-transformed the BMI and WHR measurements (because otherwise these ratios will have non-linear relationships to the other variables). I then used the manhattan distance to calculate the similarity of the different publication years and birth locations, based on the Playmates' body dimensions. This was followed by a neighbor-net analysis to display the between-year and the between-location similarities as two phylogenetic networks.

The network of relationships among the years is shown first. Years that are closely connected in the network are similar to each other based on the body dimensions of their Playmates, and those that are further apart are progressively more different from each other.

Click to enlarge

The network shows that there has been a strong and consistent change in Playmate age, size and shape through time. In the graph there is a simple gradient through time form top-right to bottom-left — the 1950s and 1960s are intermingled at the top, with the 1970s below them, the 1980s and 1990s below that, and the 2000s and 2010s intermingled at the bottom.

So, it will be worth looking at time graphs of the individual measurements. Let's start with age.

This does not show a particularly consistent trend, but the average age of the models does increase from 21 to 24 years from beginning to end of the time period.

The next graph shows that the reported height of the Playmates also increases across the 62 years, by 2.5" on average. There is almost no change in average weight across the decades (and so the graph is not shown).

However, far more notable is the relationship between height and weight, as expressed by the BMI, which is shown in the next graph. This does not show a linear trend at all, but a distinctly curved one. That is, the size of Playmates definitely changed through time, becoming thinner for the first 40 years, but then thickening up again for the next 20 years.

This trend has not been discussed in the professional literature, as far as I can determine, perhaps because previous assessments have been based only on a relatively short period of time, not the full 6 decades. Note that the bottom point of the curve occurs in c. 1997, and that by 2016 the BMI measurements had returned to the 1975 level (40 years earlier). I wonder whether they would return to the 1950s level in another 20 years?

More importantly, given that Playmates are to one degree or another reflecting a contemporary societal image of a desirable woman, we can note that 48% of these models are classified as being underweight. The lower limit of a healthy BMI is 18.5, as shown in the next graph, which also shows the boundaries between Mild thinness (17-18.5), Moderate thinness (16-17) and Severe thinness (<16).

Clearly, during the period 1975-1995 the vast majority of the models reported being underweight, while in the 1950s and 1960s very few of them did. This situation has improved recently, with roughly a half being underweight during the past 20 years. Also, several of the reported body sizes are very unhealthy. However, perhaps the BMI values below 16 are unreliable, in the sense that such a person is not likely to be very photogenic.

We can now move on to the circumferences of the models. The next graph shows the time trend for the reported circumference at breast level. This shows the biggest and most consistent change of all, with a dramatic reduction in bustiness.

Indeed, chest sizes of >36" have hardly been reported since the start of 1990, and yet in the early years a buxom 36-24-36 figure was the most common claim by the Playmates. Interestingly, very few of the models have claimed a chest size of 33" (as opposed to 32" or 34"); is this some sort of superstition?

The other large and consistent change in circumference is for waist size, as shown in the next graph. This shows the opposite trend, with an increase in average reported size of 2" across the 60 years.

There was a slight but not consistent reduction in hip circumference during time (and so the graph is not shown). This means that the WHR, the measure of curvaceousness, changed greatly through time, as shown in the next graph. So, with the waists reportedly becoming larger, there was apparently a very large reduction in the curvaceousness of the models through time.

Note that the reduction in BMI was apparently achieved in spite of an increase in waist size — the BMI reduction seems to be related to the increase in average reported height without an increase in weight, and partly to the decrease in chest size.

When combined with the reduction in breast circumference, this means that the Playmates of the 21st century have been a very different shape from those of the mid 20th century. They were taller, with smaller breasts and larger waists, and thus had fewer curves.

We can end this discussion by considering where these Playmates were born. Most of them reported being born in the USA (83%). This means that we can consider how the various states compare in producing nude models. Obviously, more models are likely to come from the most populous states, and so we need to standardize the data by dividing by the population size of each state (as estimated for 2015 in Wikipedia), to yield the number of Playmates per million people in each state.

Apparently, Hawaii and California are more likely than the other states to produce models who are prepared to take their clothes off in public, while Delaware and Vermont have not yet done so, at least as far as Playboy is concerned. The apparently large value for Washington DC represents only 2 models from a relatively small population.

We can also consider whether the dimensions of the models vary in any consistent way between the states. This can be done with a phylogenetic network, as discussed above. In the following network, states that are closely connected are similar to each other based on the body dimensions of their Playmates, and those that are further apart are progressively more different from each other.

There appear to be no consistent patterns here.

So, we can finish by considering the countries from which the remaining 17% of the models originated. Once again, the data are standardized, to yield the number of Playmates per million people in each country (or province, for Canada). The apparently large value for Malta represents one set of twins from a relatively small population.

There have been a relatively large number of models from Scandinavia (Norway, Denmark and Sweden). This presumably represents the number of females whose body shape matches the image required by the Playboy editors, as much as the willingness of Scandinavians to disrobe publicly. However, it is notable that the rate of models from Norway is double those for Denmark and Sweden.

Tuesday, October 4, 2016

Network techniques are becoming more widespread in biology and anthropology. However, the data in both of these disciplines can form very complicated patterns, indeed; and there must be practical limits to what one can do with a network analysis. This post discusses an example that covers both disciplines, and which may well exceed those limits.

Siberia is an extensive geographical region of North Asia stretching from the Ural Mountains in the west to the Pacific Ocean in the east, and from the Arctic Ocean in the north to the Kazakh and Mongolian steppes in the south. This vast territory is inhabited by a relatively small number of indigenous peoples, with most populations numbering only in the hundreds or few thousands. These indigenous peoples speak a variety of languages belonging to the Turkic, Tungusic, Mongolic, Uralic, Yeniseic, Chukotko-Kamchatkan, and Aleut-Yupik-Inuit families, as well as a few isolates. There is also variation in traditional subsistence patterns ... This linguistic and cultural diversity suggests potentially different origins and historical trajectories of the Siberian peoples.

Previous studies of the genetic history of Siberian populations were hampered by the extensive admixture that appears to have taken place among these populations, because commonly used methods assume a tree-like population history and at most single admixture events.

This suggests the use of network techniques, instead of tree-based ones. However, under the circumstances described here it may be unwise to try to produce a phyogenetic network. The situation, as described, does not resemble a "tree with reticulations" but more of an "anastomosing plexus". The latter may be more confusing than helpful, when visualized as a network.

So, the authors do not mention the word "network" nor even "reticulation". Instead:

Here we analyze geogenetic maps and use other approaches to distinguish the effects of shared ancestry from prehistoric migrations and contact, and develop a new method based on the covariance of ancestry components, to investigate the potentially complex admixture history. We furthermore adapt a previously devised method of admixture dating for use with multiple events of gene flow, and apply these methods to whole-genome genotype data [genome-wide SNPs] from over 500 individuals belonging to 20 different Siberian ethnolinguistic groups [plus 9 reference populations].

The results of these analyses indicate that there have been multiple layers of admixture detectable in most of the Siberian populations, with considerable differences in the admixture histories of individual populations.

The admixture (or introgression) patterns among the populations are illustrated using a map. Each bar represents a population, with the colors denoting the different enthnolinguistic groups. Note that every population shows admixture.

The reconstructed migration relationships among the populations are also illustrated using a map. This time, the colors of the arrows represent the different ethnolinguistic groups.

I would not like to have to represent these patterns using a network, and make that network comprehensible. So, this dataset may exceed the practical limits of networks.