This is a compilation of links to empirical datasets that might prove useful for validating mathematical algorithms associated with those phylogenetic networks intended to represent evolutionary history. In each case an aligned datafile is provided, along with annotation notes.
For each dataset, the following information is provided:
- Name: a unique name for the dataset
- Source: the publication used as the source for the data
- Zip file: contains the nexus file, the annotation notes, and a PDF copy of the source paper
- Nexus file: a text version of the nexus-formatted file for quick viewing
- Notes: a brief explanation of what the data are about, and what phylogenetic history they represent
Part 1
Datasets where the history is a tree
These serve as negative controls for network algorithms.Datasets where the history is known from experimentation
(1)
Name: Sanson
Source: Sanson GF, Kawashita SY, Brunstein A, Briones MR (2002) Experimental phylogeny of neutrally evolving DNA sequences generated by a bifurcate series of nested polymerase chain reactions. Molecular Biology and Evolution 19: 170-178.
Zip file: Sanson.zip
Nexus file: SansonLeaves.nex
Notes: complete small-subunit rDNA gene sequences from Trypanosoma cruzi; an easy tree — it is recovered by all analyses and all models
(2)
Name: Hillis
Source: Hillis DM, Bull JJ, White ME, Badgett MR, Molineux IJ (1992) Experimental phylogenetics: generation of a known phylogeny. Science 255: 589-592.
Zip file: Hillis.zip
Nexus file: Hillis.nex
Notes: three blocks with partial gene sequences from bacteriophage T7; no model gets the tree quite right
(3)
Name: Cunningham
Source: Cunningham CW, Zhu H, Hillis DM (1998) Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52: 978-987.
Zip file: Cunningham.zip
Nexus file: Cunningham.nex
Notes: three complete gene sequences + 2 partial gene sequences from bacteriophage T7; almost a star tree, and no model gets the tree right
(4)
Name: Cunningham2
Source: Cunningham CW, Jeng K, Husti J, Badgett M, Molineux IJ, Hillis DM, Bull JJ (1997) Parallel molecular evolution of deletions and nonsense mutations in bacteriophage T7. Molecular Biology and Evolution 14: 113-116.
Zip file: Cunningham2.zip
Nexus file: Cunningham2.nex
Notes: 2 partial gene sequences from bacteriophage T7; almost a star tree
(5)
Name: Sousa
Source: Sousa A, Zé-Zé L, Silva P, Tenreiro R (2008) Exploring tree-building methods and distinct molecular data to recover a known asymmetric phage phylogeny. Molecular Phylogenetics and Evolution 48: 563-573.
Zip file: Sousa.zip
Nexus file: Sousa.nex
Notes: nine blocks with partial gene sequences from bacteriophage T7
(6)
Name: Parzival
Source: Spencer M, Davidson EA, Barbrook AC, Howe CJ (2004) Phylogenetics of artificial manuscripts. Journal of Theoretical Biology 227: 503-511.
Zip file: Parzival.zip
Nexus file: Parzival.nex
Notes: one block of text from the medieval German poem "Parzival", manually copied several times
Datasets where the history is known from retrospective observation
(1)
Name: Leitner
Source: Leitner T, Escanilla D, Franzén C, Uhlén M, Albert J (1996) Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proceedings of the National Academy of Sciences of the USA 93: 10864-10869.
Zip file: Leitner.zip
Nexus file: Leitner.nex
Notes: two partial gene sequences from HIV-1 virus; no model gets the tree quite right
(2)
Name: Lemey
Source: Lemey P, Derdelinckx I, Rambaut A, Van Laethem K, Dumont S, Vermeulen S, Van Wijngaerden E, Vandamme A-M (2005) Molecular footprint of drug-selective pressure in a Human Immunodeficiency Virus transmission chain. Journal of Virology 79: 11981-11989.
Zip file: Lemey.zip
Nexus file: Lemey.nex
Notes: two partial gene sequences from HIV-1 virus; most models get the tree almost right
Datasets where the history is known from simulation
(1) Name: Camin
Source: Sokal RR (1983) A phylogenetic analysis of the Caminalcules. I. The data base. Systematic Zoology 32: 159-184.
Zip file: Camin.zip
Nexus file: 2 separate files (see the Zip file)
Notes: morphological features of artificial organisms; there are two data files, one containing only the 29 extant organisms (and for which the tree is provided) and one with both the 29 extant organisms and the 48 fossil organisms
Caveat emptor: These are simulated data, and do not therefore necessarily match real data in all ways
Part 2
Datasets where the history is reticulated
Datasets where the evidence of reticulation is independent of the dataset.Datasets where the history is known from experimentation
(i) Hybridization and Introgression
(1)
Name: Feliner
Source: Fuertes Aguilar J, Rosselló JA, Nieto Feliner G (1999) Nuclear ribosomal DNA (nrDNA) concerted evolution in natural and artificial hybrids of Armeria (Plumbaginaceae). Molecular Ecology 8: 1341-1346.
Zip file: Feliner.zip
Nexus file: Feliner.nex
Notes: one gene sequence from Armeria plants; there are three artificial hybrids, which differ only by having additive polymorphic nucleotides in some of the six positions at which the parents differ
(2)
Name: McDade
Source: McDade LA (1997) Hybrids and phylogenetic systematics. III. Comparison with distance methods. Systematic Botany 22: 669-683.
Zip file: McDade.zip
Nexus file: McDade.nex
Notes: morphology from Aphelandra plants; there are 17 artificial hybrids, originally intended to be analyzed with each F1 hybrid added individually to the set of F0 species
(ii) Text Contamination
(1)
Name: Heinrichi
Source: Roos T, Heikkilä T (2009) Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets. Literary and Linguistic Computing 24: 417-433.
See also the Computer-Assisted Stemmatology Challenge web page.
Zip file: Heinrichi.zip
Nexus file: Heinrichi.nex
Notes: one block of text from the late medieval Finnish folktale "Piispa Henrikin Surmavirsi", manually copied several times, with contamination among copies and deliberately deleted text
(2)
Name: Besoin
Source: Baret PV, Macé C, Robinson P (2006) Testing methods on an artificially created textual tradition. In Macé C, Baret P, Bozzi A, Cignoni L (eds) The Evolution of Texts: Confronting Stemmatological and Genetical Methods, pp 255-281. Istituti Editoriali e Poligrafici Internazionali, Pisa.
See also the Computer-Assisted Stemmatology Challenge web page.
Zip file: Besoin.zip
Nexus file: Besoin.nex
Notes: one block of text from the the modern French "Notre besoin de consolation est impossible à rassasier", manually copied several times, with contamination in one copy and deliberately deleted text
(iii) Pedigree
(1)
Name: Eclipse
Source: Bower MA, Campana MG, Nisbet RER, Weller R, Whitten M, Edwards CJ, Stock F, Barrett E, O'Connell TC, Hill EW, Wilson AM, Howe CJ, Barker G, Binns M (2012a) Truth in the bones: resolving the identity of the founding elite thoroughbred racehorses. Archaeometry 54: 916-925.
Zip file: Eclipse.zip
Nexus file: Eclipse.nex
Notes: mitochondrial control region from historical thoroughbred stallions; there are two reticulations from male ancestors
Datasets where the reticulation is inferred
(i) Hybridization and Introgression
(1)
Name: Donoghue
Source: Donoghue MJ, Baldwin BG, Li J, Winkworth RC (2004) Viburnum phylogeny based on chloroplast trnK intron and nuclear ribosomal ITS DNA sequences. Systematic Botany 29: 188-198.
Zip file: Donoghue.zip
Nexus file: DonoghueSubset.nex
Notes: two partial gene sequences from Viburnum plants; Viburnum prunifolium is a hybrid
(2)
Name: Rieseberg
Source: Rieseberg LH (1991) Homoploid reticulate evolution in Helianthus (Asteraceae): evidence from ribosomal genes. American Journal of Botany 78: 1218-1237.
Zip file: Rieseberg.zip
Nexus file: Rieseberg.nex
Notes: two restriction-site sets from Helianthus plants; Helianthus anomalus, Helianthus deserticola and Helianthus paradoxus are hybrids
(3)
Name: Atchley
Source: Atchley WR, Fitch WM (1991) Gene trees and the origins of inbred strains of mice. Science 254: 554-558.
Zip file: Atchley.zip
Nexus file: Atchley.nex
Notes: percentage allelic differences for 144 gene loci from laboratory mice; SEA, CBA and C3H are hybrids, but only the first one appears to be detectable in the dataset
(4)
Name: Beardsley
Source: Beardsley PM, Schoenig SE, Whittall JB, Olmstead RG (2004) Patterns of evolution in western North American Mimulus (Phrymaceae). American Journal of Botany 91: 474-489.
Zip file: Beardsley.zip
Nexus file: BeardsleyAll.nex
Notes: three partial gene sequences from Mimulus plants; Mimulus evanescens is a hybrid
(5)
Name: Hoggard
Source: Hoggard GD, Kores PJ, Molvray M, Hoggard RK (2004) The phylogeny of Gaura (Onagraceae) based on ITS, ETS, and trnL-F sequence data. American Journal of Botany 91: 139-148.
Zip file: Hoggard.zip
Nexus file: Hoggard.nex
Notes: three partial gene sequences from Gaura plants; Gaura drummondii is a hybrid
(6)
Name: Alice
Source: Alice LA, Eriksson T, Eriksen B, Campbell CS (2001) Hybridization and gene flow between distantly related species of Rubus (Rosaceae): evidence from nuclear ribosomal DNA internal transcribed spacer region sequences. Systematic Botany 26: 769-778.
Zip file: Alice.zip
Nexus file: Alice.nex
Notes: one partial sequence from Rubus plants; five hybrids, but three are similar to the parents
(7)
Name: Howarth
Source: Howarth DG, Baum DA (2005) Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian Islands. Evolution 59: 948-961.
Zip file: Howarth.zip
Nexus file: Howarth.nex
Notes: four partial gene sequences from Scaevola plants; there are three samples of the hybrid Scaevola procera
(8)
Name: Moody
Source: Moody ML, Rieseberg LH (2012) Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus). Molecular Phylogenetics and Evolution 64: 145–155.
Zip file: Moody.zip
Nexus files: 11 separate files (see the Zip file)
Notes: eleven partial gene sequences from Helianthus plants, with multiple accessions for many of the species, and multiple alleles for many of the accessions; Helianthus anomalus, Helianthus deserticola and Helianthus paradoxus are hybrids; some recombinants have also been detected
Caveat emptor: There are discrepancies between Table 1 and Figure 1 in the paper, and between both of these and the dataset; these are detailed in the Excel spreadsheet in the Zip file
(ii) Recombination
(1)
Name: ODonnell
Source: O’Donnell K, Kistler HC, Tacke BK, Casper HH (2000) Gene genealogies reveal global phylogeographic structure and reproductive isolation among lineages of Fusarium graminearum, the fungus causing wheat scab. Proceedings of the National Academy of Sciences of the USA 97: 7905-7910.
Zip file: ODonnell.zip
Nexus file: ODonnellAll.nex
Notes: six partial gene sequences from Fusarium fungi; NRRL_28338 and NRRL_28721 are recombinants
(2)
Name: Bollyky
Source: Bollyky PL, Rambaut A, Harvey PH, Holmes EC (1996) Recombination between sequences of Hepatitis B Virus from different genotypes. Journal of Molecular Evolution 42: 97-102.
Zip file: Bollyky.zip
Nexus file: Bollyky.nex
Notes: complete genome sequences from Hepatitis B viruses; HBVDNA and HPBADWl are reassortants
(3)
Name: Starr
Source: Starr JR, Gravel G, Bruneau A, Muasya AM (1996) Phylogenetic implications of a unique 5.8s nrDNA insertion in Cyperaceae. Aliso 23: 84-98.
Zip file: Starr.zip
Nexus file: Starr.nex
Notes: one partial gene sequence from sedge and rush plants; Oxychloe andina is a chimeric sequence
(4)
Name: Cooper
Source: Cooper MA, Adam RD, Worobey M, Sterling CR (2007) Population genetics provides evidence for recombination in Giardia. Current Biology 17: 1984-1988.
Zip file: Cooper.zip
Nexus file: Cooper.nex
Notes: three partial chromosome sequences from Giardia protozoa; Giardia intestinalis isolate 335 is a recombinant
(5)
Name: Aoyama
Source: Aoyama J, Nishida M, Tsukamoto K (2001) Molecular phylogeny and evolution of the freshwater eel, genus Anguilla. Molecular Phylogenetics and Evolution 20: 450-459.
Zip file: Aoyama.zip
Nexus file: Aoyama.nex
Notes: one partial gene sequence from Anguilla eels; Anguilla bicolor bicolor is a recombinant
(6)
Name: Sessa
Source: Sessa EB, Zimmer EA, Givnish TJ (2012) Unraveling reticulate evolution in North American Dryopteris (Dryopteridaceae). BMC Evolutionary Biology 12: 104.
Zip file: Sessa.zip
Nexus file: Sessa.nex
Notes: eight partial gene sequences from Dryopteris ferns; Dryopteris celsa EBS27 is a recombinant
(iii) Lateral Gene Transfer
To be added
(iv) Word Borrowing
(1) Name: List
Source: List J-M, Nelson-Sathi S, Geisler H, Martin W (2013) Networks of lexical borrowing and lateral gene transfer in language and genome evolution. Bioessays 36: 141-150.
Zip file: List.zip
Nexus file: 2 separate files (see the Zip file)
Notes: presence/absence of sets of cognate words; there are two data files, one with known borrowings (loan words) included and one without; extensive word borrowings are known in several languages
Thank you
ReplyDelete