Wednesday, April 3, 2013

Representing evolutionary scenarios using splits graphs

Splits graphs are basically data-display networks, since their intended purpose is to graphically display the patterns of variation in a dataset. These patterns may relate to evolutionary history, or they may not.

A couple of weeks ago I discussed a paper by Myles et al. concerning the genetics of grape cultivars, and this paper provides an interesting example where the patterns of genetic variation seem to be strongly phylogenetic in nature (Myles S, Boyko AR, Owens CL, Brown PJ, Grassi F, Aradhya MK, Prins B, Reynolds A, Chia JM, Ware D, Bustamante CD, Buckler ES. 2011. Genetic structure and domestication history of the grape. Proceedings of the National Academy of Sciences of the USA 108: 3530-3535).

Myles et al. note that: "Archaeological evidence suggests that grape domestication took place in the South Caucasus between the Caspian and Black Seas and that cultivated vinifera then spread south to the western side of the Fertile Crescent, the Jordan Valley, and Egypt by 5,000 y ago." They provide an explicit historical scenario of the evolutionary history of cultivated grapes (Vitis vinifera):
  1. There are two species involved (V.sylvestris, V.vinifera), both distributed along the eastern and northern part of the Mediterranean basin;
  2. V.vinifera was domesticated from V.sylvestris in the eastern part of the distribution;
  3. V.vinifera then spread geographically from east to west;
  4. This spread was followed by introgression of V.sylvestris into V.vinifera in the western part of their joint distribution.
Myles et al. generated genotype data from a custom microarray, which assayed 5,387 SNPs genotyped in 570 V.vinifera samples and 59 V.sylvestris accessions from the US Department of Agriculture (USDA) germplasm collection. Average population-pairwise Fst estimates were then calculated from all 5,387 SNPs weighted by allele frequency, based on species and geographical region.

I constructed a NeighborNet splits graph from these Fst data, as shown in the graph. According to Myles et al., the geographic regions are defined as follows: "east" includes locations east of Istanbul, Turkey; "west" includes locations west of Slovenia, including Austria; and "central" refers to locations between them.

Each of the splits (bipartitions) in the graph represents one of the four steps in the hypothesized scenario, as labelled in the figure. Thus, there is apparently phylogenetic signal remaining from all of these proposed historical events that can be detected in the genetic distances. As the authors note: "Our analyses of relatedness between vinifera and sylvestris populations are consistent with the archaeological data".

Note, however, that one cannot infer the scenario from the splits graph, because the data analysis is not intended for direct evolutionary inference. The graph is undirected, and there are therefore several possible scenarios that could be derived from the graph. For example, the graph shown is also compatible with the domestication of V.vinifera from V.sylvestris in the western part of the distribution.

Thus, a splits graph can be used to suggest scenarios (ie. hypothesis generation) and it can be used to test scenarios (hypothesis testing), but the latter is a weak test because there will always be several phylogenetic scenarios with which it is compatible.

