Sunday, May 6, 2012

Network analysis of scotch whiskies


When using non-scientific data to illustrate mathematical methods, I am told that the most common sources are baseball averages, movie grosses and political polls. I do not wish to resort to such clichéd examples, at least not yet. So, I am going to continue my series (which started with the Eurovision Song Contest) by analyzing a well-known dataset by François-Joseph Lapointe and Pierre Legendre (1994) A classification of pure malt scotch whiskies. Applied Statistics 43: 237-257. These same authors have also re-used these data: Pierre Legendre and François-Joseph Lapointe (2004) Assessing congruence among distance matrices: single-malt scotch whiskies revisited. Australian and New Zealand Journal of Statistics 46: 615-629.

The data consist of measurements of 68 characteristics (nose, color, body, palate, finish) for 109 single-malt scotch whiskies. The original authors analyzed these data using a similarity matrix and a tree. From this they produced a classification of the whiskies. They concluded that there is, indeed, a weak but detectable relationship between their classification and the geographical location of the various distilleries.

I have re-analyzed these data using a weighted bray-curtis similarity and a neighbor-net network. The bray-curtis similarity ignores "negative matches", as discussed in the previous post, so that only shared characteristics generate similarity (not shared lack of a characteristic). The weights were used to give the five types of characteristic equal influence (the 68 characteristics are not equally distributed among the five types).

Whiskies that are closely connected in the network are similar to each other based on the 68 characteristics, and those that are further apart are progressively more different from each other. I have added colors to the network along geographical lines within Scotland: light purple = lowland, red = east, yellow = midland, light green = north, dark green = west, light blue = islands, dark purple = Islay, black = Speyside. These groups are the same as those used by Lapointe and Legendre.

Click to enlarge.

There is very little to say about this diagram, except that it is not very tree-like, thus calling into question any classification scheme, and there is very little evidence of geographical patterns. There are 21 whiskies at the left of the diagram that share the biggest split, none of which come from the lowlands, midlands or Islay, but that is it.

I have been reliably told, by people with extensive experience of the matter, that each and every Scotch single malt is unique, and that therefore personal preference for one over another is entirely justified. I now have the feeling that this may actually be true.

Note: There is a follow-up post (Single-malt scotch whiskies — a network).

No comments:

Post a Comment