Tuesday, August 2, 2016
A century of French wine vintages
It has been quite some time since I have produced a network-based exploratory data analysis (EDA) of some multivariate dataset, so it could be time to do so again.
In the wine industry, it is common to provide quality scores for the different vintages from particular wine-producing regions. These so-called vintage charts are intended to tell us how the harvest quality has varied from vintage to vintage. They are often disparaged, because they simplify the complexities of each harvest (where there can be considerable spatial variation) down into a single number. They also make little sense if a single number is applied to a very large area, which often occurs in practice.
Nevertheless, they can be an interesting and informative guide to the general features of each vintage, especially if they cover a long period of time.
My interest in this concept comes from the fact that I have recently started a blog about wine: The Wine Gourd. In the interests of doing something different to every other wine blogger, this blog delves into the world of wine data, instead of the usual reviews of recently released wines. The intention is to ferret out some of the interesting stuff, and to bring it out into the light, for everyone to see. Hopefully, this will be both interesting and informative.
French wine vintages
The Cavus Vinifera web site has produced vintage charts for several of the wine-producing regions of France, from the year 1900 to the present. This is very unusual, as most vintage charts cover a much shorter period of time. This circumstance thus provides the opportunity to compare these French regions over the past century, to investigate to what extent vintage variation is correlated among these areas.
Each vintage from 1900-2014 has been rated on a scale of 0-20. The region and wines covered by the entire time span include:
Région de Bordeaux (rouge)
Région de Bordeaux (blanc)
Région de Bordeaux (liquoreux)
Région de la Bourgogne (rouge)
Région de la Bourgogne (blanc)
Région du Rhône (Nord)
Région du Rhône (Sud)
Région du Loire (rouge)
Région de la Champagne
Région du Beaujolais
As usual, we can use a phylogenetic network to visualize these data, with the network being used as a form of exploratory data analysis. I first used the manhattan distance to calculate the similarity of the different years and regions, based on the quality scores. This was followed by a neighbor-net analysis to display the between-region and the between-year similarities as two phylogenetic networks.
The network for the ten regions is shown in the first graph. Regions that are closely connected in the network are similar to each other based on the variation in their vintage quality scores through time, and those that are further apart are progressively more different from each other.
Not unexpectedly, the different wines from the same regions form neighborhoods: the three wines types from Bordeaux (in south-western France); the three wines from Burgundy and Beaujolais (along the Saône River in eastern France); and the two wines from the Rhône River (in the south-east). However, unexpectedly, the Loire wine, from western France, is associated with the Rhône wines, while the Champagne region, in northern France, is somewhat isolated.
The network for the 115 years is shown in the second graph. In this case, years that are closely connected in the network are similar to each other based on the vintage quality scores averaged across all of the regions, and those that are further apart are progressively more different from each other.
Here, the years form a gradient from the poorest-quality years, at the top, to the best-quality vintages at the bottom. Only four of the vintages are labeled, but the vintages at the top of the network include 1902, 1910, 1913, 1930, 1931, and 1968. The vintages at the bottom of the graph include: 1929, 1945 and 1947, followed by 1928, 1949, 1989 and 1990, and then 1906, 1953, 1959, 1961 and 2005.
Note that the 1930s were generally not a good time for wine-making in France, and nor were the 1910s (although 1906 was an early century exception). The 1940s and 1950s, on the other hand, were generally good times for wine production.
The 1910 vintage stands out as particularly poor, with none of the regions scoring more than 10 out of 20 for their grape harvest, and both Burgundy wines scoring 0. This contrasts with the best years, where no region scored less than 16 out of 20.
Needless to say, the years stacked in the middle of the graph were variable, with some regions having a good time in a particular year and some having a bad time in that same year. This is the normal state of affairs.