Monday, February 3, 2014

Single-malt scotch whiskies — a network

In a previous post I presented a Network analysis of scotch whiskies. This analyzed 109 single-malt scotch whiskies based on the tasting notes of a single author, measuring 68 characteristics (nose, color, body, palate, finish). The analysis demonstrated that the conventional way of classifying Scotch malt whiskies by region does not relate to their taste.

An alternative approach to classifying the whiskies is, then, to actually try to group them by taste. This topic was tackled in the book by David Wishart (2002) Whisky Classified: Choosing Single Malts by Flavour, Pavilion Books, London. His objective was: "if you like a particular malt whisky, then we can tell you what other brands taste similar." This book was revised in 2006, and there was a 10th anniversary edition in 2012. There is also an associated web page (Whisky Classified).

A vocabulary of 500 aromatic and taste descriptors was compiled from the tasting notes in the 10 books. These were grouped into 12 broad aromatic features: Body (Light-Heavy), Sweetness (Dry-Sweet), Smoky (Peaty), Medicinal (Salty), Feinty (Sulphury), Honey (Vanilla), Spicy (Woody), Winey (Sherry), Nutty (Oaky-Creamy), Malty (Cerealy), Fruity (Estery) and Floral (Herbal). The 12 flavour categories are scored on a scale of 0-4 according to the intensity with which each feature is present in a whisky.

The 86 single malts were classified using ClustanGraphics. The cluster analysis groups malts into the same cluster when they have broadly the same taste characteristics across all 12 sensory variables. Technically, the method minimizes the variance within clusters and maximizes the variance between clusters. The result was ten clusters of single malt whiskies.

The order of the 10 clusters A-J maximizes the row-wise rank correlation of the underlying proximity matrix. Readers who are familiar with malt whiskies may recognise the two extremes of strongly sherried malts (cluster A) and the heavily peated, mainly Islay malts (cluster J). Adjacent to these polar benchmarks are the lightly sherried (clusters B and C) and lightly peated (clusters H and I) malts, with the light-bodied, floral and malty clusters, including four largely unpeated groups (clusters D-G) falling in the middle.

The classification is shown at the bottom of this post.

I have re-analyzed these data using the manhattan distance and a neighbor-net network. A copy of my data spreadsheet is available online. (Note that other online copies of the data [eg. here] contain errors.)

Whiskies that are closely connected in the network are similar to each other based on the 12 characteristics, and those that are further apart are progressively more different from each other. I have added colours to the network representing the ten alleged groups:

Cluster A
Cluster B
Cluster C
Cluster D
Cluster E

blue
light blue
light green
green
black

Cluster F
Cluster G
Cluster H
Cluster I
Cluster J

brown
orange
pink
crimson
red

This shows that the book's order of the groups proceeds roughly from the middle right of the network (blue) clockwise around to the top right (red).

This analysis provides only a vague justification for the book's classification. The red group does form a distinct cluster in the network, as does the blue group, which are the two extremes of the classification order. However, some of the middle groups, eg. brown and light green, do not form network clusters at all. The other groups more or less form neighbourhoods in the network, but they do not form clusters.

Therefore, trying to use this classification as the author intended, to identify whiskies that taste similar to each other, will be difficult. For example, the network shows that Ardmore is similar to Old Fettercairn, and Glen Deveron is similar to Tullibardine, but these pairs are nothing like each other — and yet all four of them are classified together (in Cluster F).

So, single-malt Scotch whiskies do not really form groups, except for the peaty flavoured ones (mostly from Islay), and to some extent the sherry flavoured ones. The rest form a continuous gradient between these two extremes. They all taste different, to one extent or another.

Finally, Wishart is not the only person to have tried clustering these data — Luba Gloukhov also tried, using k-means clustering, to no greater effect. Clustering techniques only work if there are groups in the data, and in this case the data show continuous variation between the two extremes.