This is a compilation of links to empirical datasets that might prove useful for validating mathematical algorithms associated with those phylogenetic networks intended to represent evolutionary history. In each case an aligned datafile is provided, along with annotation notes.
For each dataset, the following information is provided:
- Name: a unique name for the dataset
- Source: the publication used as the source for the data
- Zip file: contains the nexus file, the annotation notes, and a PDF copy of the source paper
- Nexus file: a text version of the nexus-formatted file for quick viewing
- Notes: a brief explanation of what the data are about, and what phylogenetic history they represent
These serve as negative controls for network algorithms.
Datasets where the history is a tree
Datasets where the history is known from experimentation
Source: Sanson GF, Kawashita SY, Brunstein A, Briones MR (2002) Experimental phylogeny of neutrally evolving DNA sequences generated by a bifurcate series of nested polymerase chain reactions. Molecular Biology and Evolution 19: 170-178.
Zip file: Sanson.zip
Nexus file: SansonLeaves.nex
Notes: complete small-subunit rDNA gene sequences from Trypanosoma cruzi; an easy tree — it is recovered by all analyses and all models
Source: Hillis DM, Bull JJ, White ME, Badgett MR, Molineux IJ (1992) Experimental phylogenetics: generation of a known phylogeny. Science 255: 589-592.
Zip file: Hillis.zip
Nexus file: Hillis.nex
Notes: three blocks with partial gene sequences from bacteriophage T7; no model gets the tree quite right
Source: Cunningham CW, Zhu H, Hillis DM (1998) Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52: 978-987.
Zip file: Cunningham.zip
Nexus file: Cunningham.nex
Notes: three complete gene sequences + 2 partial gene sequences from bacteriophage T7; almost a star tree, and no model gets the tree right
Source: Cunningham CW, Jeng K, Husti J, Badgett M, Molineux IJ, Hillis DM, Bull JJ (1997) Parallel molecular evolution of deletions and nonsense mutations in bacteriophage T7. Molecular Biology and Evolution 14: 113-116.
Zip file: Cunningham2.zip
Nexus file: Cunningham2.nex
Notes: 2 partial gene sequences from bacteriophage T7; almost a star tree
Source: Sousa A, Zé-Zé L, Silva P, Tenreiro R (2008) Exploring tree-building methods and distinct molecular data to recover a known asymmetric phage phylogeny. Molecular Phylogenetics and Evolution 48: 563-573.
Zip file: Sousa.zip
Nexus file: Sousa.nex
Notes: nine blocks with partial gene sequences from bacteriophage T7
Source: Spencer M, Davidson EA, Barbrook AC, Howe CJ (2004) Phylogenetics of artificial manuscripts. Journal of Theoretical Biology 227: 503-511.
Zip file: Parzival.zip
Nexus file: Parzival.nex
Notes: one block of text from the medieval German poem "Parzival", manually copied several times
Datasets where the history is known from retrospective observation
Source: Leitner T, Escanilla D, Franzén C, Uhlén M, Albert J (1996) Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proceedings of the National Academy of Sciences of the USA 93: 10864-10869.
Zip file: Leitner.zip
Nexus file: Leitner.nex
Notes: two partial gene sequences from HIV-1 virus; no model gets the tree quite right
Source: Lemey P, Derdelinckx I, Rambaut A, Van Laethem K, Dumont S, Vermeulen S, Van Wijngaerden E, Vandamme A-M (2005) Molecular footprint of drug-selective pressure in a Human Immunodeficiency Virus transmission chain. Journal of Virology 79: 11981-11989.
Zip file: Lemey.zip
Nexus file: Lemey.nex
Notes: two partial gene sequences from HIV-1 virus; most models get the tree almost right
Datasets where the evidence of reticulation is independent of the dataset.
Datasets where the history is reticulated
Datasets where the history is known from experimentation
Source: Fuertes Aguilar J, Rosselló JA, Nieto Feliner G (1999) Nuclear ribosomal DNA (nrDNA) concerted evolution in natural and artificial hybrids of Armeria (Plumbaginaceae). Molecular Ecology 8: 1341-1346.
Zip file: Feliner.zip
Nexus file: Feliner.nex
Notes: one gene sequence from Armeria plants; there are three artificial hybrids, which differ only by having additive polymorphic nucleotides in some of the six positions at which the parents differ
Source: McDade LA (1997) Hybrids and phylogenetic systematics. III. Comparison with distance methods. Systematic Botany 22: 669-683.
Zip file: McDade.zip
Nexus file: McDade.nex
Notes: morphology from Aphelandra plants; there are 17 artificial hybrids, originally intended to be analyzed with each F1 hybrid added individually to the set of F0 species
Source: Roos T, Heikkilä T (2009) Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets. Literary and Linguistic Computing 24: 417-433.
See also the Computer-Assisted Stemmatology Challenge web page.
Zip file: Heinrichi.zip
Nexus file: Heinrichi.nex
Notes: one block of text from the late medieval Finnish folktale "Piispa Henrikin Surmavirsi", manually copied several times, with contamination among copies and deliberately deleted text
Source: Baret PV, Macé C, Robinson P (2006) Testing methods on an artificially created textual tradition. In Macé C, Baret P, Bozzi A, Cignoni L (eds) The Evolution of Texts: Confronting Stemmatological and Genetical Methods, pp 255-281. Istituti Editoriali e Poligrafici Internazionali, Pisa.
Zip file: Besoin.zip
Nexus file: Besoin.nex
Notes: one block of text from the the modern French "Notre besoin de consolation est impossible à rassasier", manually copied several times, with contamination in one copy and deliberately deleted text
Datasets where the reticulation is inferred
Source: Donoghue MJ, Baldwin BG, Li J, Winkworth RC (2004) Viburnum phylogeny based on chloroplast trnK intron and nuclear ribosomal ITS DNA sequences. Systematic Botany 29: 188-198.
Zip file: Donoghue.zip
Nexus file: DonoghueSubset.nex
Notes: two partial gene sequences from Viburnum plants; Viburnum prunifolium is a hybrid
Source: Rieseberg LH (1991) Homoploid reticulate evolution in Helianthus (Asteraceae): evidence from ribosomal genes. American Journal of Botany 78: 1218-1237.
Zip file: Rieseberg.zip
Nexus file: Rieseberg.nex
Notes: two restriction-site sets from Helianthus plants; Helianthus anomalus, Helianthus deserticola and Helianthus paradoxus are hybrids
Source: Atchley WR, Fitch WM (1991) Gene trees and the origins of inbred strains of mice. Science 254: 554-558.
Zip file: Atchley.zip
Nexus file: Atchley.nex
Notes: percentage allelic differences for 144 gene loci from laboratory mice; SEA, CBA and C3H are hybrids, but only the first one appears to be detectable
Source: Beardsley PM, Schoenig SE, Whittall JB, Olmstead RG (2004) Patterns of evolution in western North American Mimulus (Phrymaceae). American Journal of Botany 91: 474-489.
Zip file: Beardsley.zip
Nexus file: BeardsleyAll.nex
Notes: three partial gene sequences from Mimulus plants; Mimulus evanescens is a hybrid
Source: Hoggard GD, Kores PJ, Molvray M, Hoggard RK (2004) The phylogeny of Gaura (Onagraceae) based on ITS, ETS, and trnL-F sequence data. American Journal of Botany 91: 139-148.
Zip file: Hoggard.zip
Nexus file: Hoggard.nex
Notes: three partial gene sequences from Gaura plants; Gaura drummondii is a hybrid
Source: Alice LA, Eriksson T, Eriksen B, Campbell CS (2001) Hybridization and gene flow between distantly related species of Rubus (Rosaceae): evidence from nuclear ribosomal DNA internal transcribed spacer region sequences. Systematic Botany 26: 769-778.
Zip file: Alice.zip
Nexus file: Alice.nex
Notes: one partial sequence from Rubus plants; five hybrids, but three are similar to the parents
Source: Howarth DG, Baum DA (2005) Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian Islands. Evolution 59: 948-961.
Zip file: Howarth.zip
Nexus file: Howarth.nex
Notes: four partial gene sequences from Scaevola plants; three samples of the hybrid Scaevola procera
Source: O’Donnell K, Kistler HC, Tacke BK, Casper HH (2000) Gene genealogies reveal global phylogeographic structure and reproductive isolation among lineages of Fusarium graminearum, the fungus causing wheat scab. Proceedings of the National Academy of Sciences of the USA 97: 7905-7910.
Zip file: ODonnell.zip
Nexus file: ODonnellAll.nex
Notes: six partial gene sequences from Fusarium fungi; NRRL_28338 and NRRL_28721 are recombinants
Source: Bollyky PL, Rambaut A, Harvey PH, Holmes EC (1996) Recombination between sequences of Hepatitis B Virus from different genotypes. Journal of Molecular Evolution 42: 97-102.
Zip file: Bollyky.zip
Nexus file: Bollyky.nex
Notes: complete genome sequences from Hepatitis B viruses; HBVDNA and HPBADWl are recombinants
Source: Starr JR, Gravel G, Bruneau A, Muasya AM (1996) Phylogenetic implications of a unique 5.8s nrDNA insertion in Cyperaceae. Aliso 23: 84-98.
Zip file: Starr.zip
Nexus file: Starr.nex
Notes: one partial gene sequence from sedge and rush plants; Oxychloe andina is a chimeric sequence
Source: Cooper MA, Adam RD, Worobey M, Sterling CR (2007) Population genetics provides evidence for recombination in Giardia. Current Biology 17: 1984-1988.
Zip file: Cooper.zip
Nexus file: Cooper.nex
Notes: three partial chromosome sequences from Giardia protozoa; Giardia intestinalis isolate 335 is a reassortment
Source: Aoyama J, Nishida M, Tsukamoto K (2001) Molecular phylogeny and evolution of the freshwater eel, genus Anguilla. Molecular Phylogenetics and Evolution 20: 450-459.
Zip file: Aoyama.zip
Nexus file: Aoyama.nex
Notes: one partial gene sequence from Anguilla eels; Anguilla bicolor bicolor is a recombinant
Source: Sessa EB, Zimmer EA, Givnish TJ (2012) Unraveling reticulate evolution in North American Dryopteris (Dryopteridaceae). BMC Evolutionary Biology 12: 104.
Zip file: Sessa.zip
Nexus file: Sessa.nex
Notes: eight partial gene sequences from Dryopteris ferns; Dryopteris celsa EBS27 is a recombinant
(iii) Lateral Gene Transfer
To be added