Tuesday, June 14, 2016

Grape genealogies are networks, not trees

I have noted before that the genealogies for all domesticated organisms are networks not trees, and specifically they are hybridization networks. That is, in sexually reproducing species, every offspring is the hybrid of two parents. If we include both parents in the pedigree, plus all of their relatives, then this will form a complex network every time inbreeding occurs.

I have previously illustrated this phenomenon using genealogies of grape cultivars:
     Are phylogenetic trees useful for domesticated organisms?
     First-degree relationships and partly directed networks

Reconstructing grape genealogies is often a tricky business. This was originally done using phenotypic characters and historical records, of course, but these days we use DNA from whatever cultivars are available for sampling. Perhaps the biggest problem is that many of the cultivars are no longer known (there have been at least 10,000 of them recorded at some time in history), so that the genealogies are full of question marks representing unknown (unsampled) parents.

The practical consequence of this is that the time direction of the genealogy will be ambiguous whenever there is a missing parent. Estimates of identity-by-descent (IBD) are calculated based on linkage analysis for all pairwise comparisons of samples, and complex crossing schemes can generate IBD values that are indistinguishable from sibling relationships. So, in these cases we cannot distinguish parent-offspring relationships from sibling relationships.

A simple example is shown in the most detailed current book on grape cultivars:
Jancis Robinson, Julia Harding, José Vouillamoz (2012) Wine Grapes: a Complete Guide to 1,368 Vine Varieties, including their Origins and Flavours. Allen Lane / Ecco.
This example involves the grand-parentage of the Shiraz grape, usually called Syrah in the effete monarchies of the Old World. The authors present three possible scenarios, as shown here.

There are five sampled cultivars and two inferred unknowns, arranged in an unrooted network. Because the unknowns are inferred to be parents, the network can be rooted in any of three different places, as shown by the three Options illustrated.

The authors (or, more specifically, the third author, who is the one responsible for the genealogies) are in favour of Option A. This means that Mondeuse Noir and Viognier are Syrah's half-siblings rather than either being the grandparent.

This small genealogy is a tree, but when we move to larger genealogies the network nature of the cultivars should become obvious.

However, the authors resort to a standard subterfuge to hide this fact. This strategy is to show cultivars multiple times in the genealogies, to avoid drawing reticulate relationships. I have illustrated this approach a couple of times before in this blog:

     Reducing networks to trees
    Thoroughbred horses and reticulate pedigrees

In the following genealogy of the Pinot cultivar, the authors note: "For the sake of clarity, Trebbiano Toscano and Folle Blanche appear twice in the diagram."

Trees reign supreme as simplifications of networks!

No comments:

Post a Comment