This is a compilation of links to empirical datasets that might prove useful for validating mathematical algorithms associated with those phylogenetic networks intended to represent evolutionary history. In each case an aligned datafile is provided, along with annotation notes.

For each dataset, the following information is provided:
  • Name: a unique name for the dataset
  • Source: the publication used as the source for the data
  • Zip file: contains the nexus file, the annotation notes, and a PDF copy of the source paper
  • Nexus file: a text version of the nexus-formatted file for quick viewing
  • Notes: a brief explanation of what the data are about, and what phylogenetic history they represent

Part 1
Datasets where the history is a tree

These serve as negative controls for network algorithms.

Datasets where the history is known from experimentation

Name: Sanson
Source: Sanson GF, Kawashita SY, Brunstein A, Briones MR (2002) Experimental phylogeny of neutrally evolving DNA sequences generated by a bifurcate series of nested polymerase chain reactions. Molecular Biology and Evolution 19: 170-178.
Nexus fileSansonLeaves.nex
Notes: complete small-subunit rDNA gene sequences from Trypanosoma cruzi; an easy tree — it is recovered by all analyses and all models

Name: Hillis
Source: Hillis DM, Bull JJ, White ME, Badgett MR, Molineux IJ (1992) Experimental phylogenetics: generation of a known phylogeny. Science 255: 589-592.
Nexus fileHillis.nex
Notes: three blocks with partial gene sequences from bacteriophage T7; no model gets the tree quite right

Name: Cunningham
Source: Cunningham CW, Zhu H, Hillis DM (1998) Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52: 978-987.
Nexus fileCunningham.nex
Notes: three complete gene sequences + 2 partial gene sequences from bacteriophage T7; almost a star tree, and no model gets the tree right

Name: Cunningham2
Source: Cunningham CW, Jeng K, Husti J, Badgett M, Molineux IJ, Hillis DM, Bull JJ (1997) Parallel molecular evolution of deletions and nonsense mutations in bacteriophage T7. Molecular Biology and Evolution 14: 113-116.
Nexus fileCunningham2.nex
Notes: 2 partial gene sequences from bacteriophage T7; almost a star tree

Name: Sousa
Source: Sousa A, Zé-Zé L, Silva P, Tenreiro R (2008) Exploring tree-building methods and distinct molecular data to recover a known asymmetric phage phylogeny. Molecular Phylogenetics and Evolution 48: 563-573.
Nexus fileSousa.nex
Notes: nine blocks with partial gene sequences from bacteriophage T7

Name: Parzival
Source: Spencer M, Davidson EA, Barbrook AC, Howe CJ (2004) Phylogenetics of artificial manuscripts. Journal of Theoretical Biology 227: 503-511.
Nexus fileParzival.nex
Notes: one block of text from the medieval German poem "Parzival", manually copied several times

Datasets where the history is known from retrospective observation

Name: Leitner
Source: Leitner T, Escanilla D, Franzén C, Uhlén M, Albert J (1996) Accurate reconstruction of a known HIV-1 transmission history by phylogenetic tree analysis. Proceedings of the National Academy of Sciences of the USA 93: 10864-10869.
Nexus fileLeitner.nex
Notes: two partial gene sequences from HIV-1 virus; no model gets the tree quite right

Name:  Lemey
Source: Lemey P, Derdelinckx I, Rambaut A, Van Laethem K, Dumont S, Vermeulen S, Van Wijngaerden E, Vandamme A-M (2005) Molecular footprint of drug-selective pressure in a Human Immunodeficiency Virus transmission chain. Journal of Virology 79: 11981-11989.
Nexus fileLemey.nex
Notes: two partial gene sequences from HIV-1 virus; most models get the tree almost right

Part 2
Datasets where the history is reticulated

Datasets where the evidence of reticulation is independent of the dataset.

Datasets where the history is known from experimentation

(i) Hybridization

Name: Feliner
Source: Fuertes Aguilar J, Rosselló JA, Nieto Feliner G (1999) Nuclear ribosomal DNA (nrDNA) concerted evolution in natural and artificial hybrids of Armeria (Plumbaginaceae). Molecular Ecology 8: 1341-1346.
Nexus fileFeliner.nex
Notes: one gene sequence from Armeria plants; there are three artificial hybrids, which differ only by having additive polymorphic nucleotides in some of the six positions at which the parents differ

Name: McDade
Source: McDade LA (1997) Hybrids and phylogenetic systematics. III. Comparison with distance methods. Systematic Botany 22: 669-683.
Nexus fileMcDade.nex
Notes: morphology from Aphelandra plants; there are 17 artificial hybrids, originally intended to be analyzed with each F1 hybrid added individually to the set of F0 species

(ii) Contamination

Name: Heinrichi
Source: Roos T, Heikkilä T (2009) Evaluating methods for computer-assisted stemmatology using artificial benchmark data sets. Literary and Linguistic Computing 24: 417-433.
See also the Computer-Assisted Stemmatology Challenge web page.
Nexus fileHeinrichi.nex
Notes: one block of text from the late medieval Finnish folktale "Piispa Henrikin Surmavirsi", manually copied several times, with contamination among copies and deliberately deleted text

Name: Besoin
Source: Baret PV, Macé C,  Robinson P (2006) Testing methods on an artificially created textual tradition. In Macé C, Baret P, Bozzi A, Cignoni L (eds) The Evolution of Texts: Confronting Stemmatological and Genetical Methods, pp 255-281. Istituti Editoriali e Poligrafici Internazionali, Pisa.
Nexus fileBesoin.nex
Notes: one block of text from the the modern French "Notre besoin de consolation est impossible à rassasier", manually copied several times, with contamination in one copy and deliberately deleted text

Datasets where the reticulation is inferred

(i) Hybridization

Name: Donoghue
Source: Donoghue MJ, Baldwin BG, Li J, Winkworth RC (2004) Viburnum phylogeny based on chloroplast trnK intron and nuclear ribosomal ITS DNA sequences. Systematic Botany 29: 188-198.
Nexus fileDonoghueSubset.nex
Notes: two partial gene sequences from Viburnum plants; Viburnum prunifolium is a hybrid

Name: Rieseberg
Source: Rieseberg LH (1991) Homoploid reticulate evolution in Helianthus (Asteraceae): evidence from ribosomal genes. American Journal of Botany 78: 1218-1237.
Nexus fileRieseberg.nex
Notes: two restriction-site sets from Helianthus plants; Helianthus anomalus, Helianthus deserticola and Helianthus paradoxus are hybrids

Name: Atchley
Source: Atchley WR, Fitch WM (1991) Gene trees and the origins of inbred strains of mice. Science 254: 554-558.
Nexus fileAtchley.nex
Notes: percentage allelic differences for 144 gene loci from laboratory mice; SEA, CBA and C3H are hybrids, but only the first one appears to be detectable

Name: Beardsley
Source: Beardsley PM, Schoenig SE, Whittall JB, Olmstead RG (2004) Patterns of evolution in western North American Mimulus (Phrymaceae). American Journal of Botany 91: 474-489.
Nexus fileBeardsleyAll.nex
Notes: three partial gene sequences from Mimulus plants; Mimulus evanescens is a hybrid

Name: Hoggard
Source: Hoggard GD, Kores PJ, Molvray M, Hoggard RK (2004) The phylogeny of Gaura (Onagraceae) based on ITS, ETS, and trnL-F sequence data. American Journal of Botany 91: 139-148.
Nexus fileHoggard.nex
Notes: three partial gene sequences from Gaura plants; Gaura drummondii is a hybrid

Name: Alice
Source: Alice LA, Eriksson T, Eriksen B, Campbell CS (2001) Hybridization and gene flow between distantly related species of Rubus (Rosaceae): evidence from nuclear ribosomal DNA internal transcribed spacer region sequences. Systematic Botany 26: 769-778.
Nexus fileAlice.nex
Notes: one partial sequence from Rubus plants; five hybrids, but three are similar to the parents

Name: Howarth
Source: Howarth DG, Baum DA (2005) Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian Islands. Evolution 59: 948-961.
Nexus fileHowarth.nex
Notes: four partial gene sequences from Scaevola plants; three samples of the hybrid Scaevola procera

(ii) Recombination

Name: ODonnell
Source: O’Donnell K, Kistler HC, Tacke BK, Casper HH (2000) Gene genealogies reveal global phylogeographic structure and reproductive isolation among lineages of Fusarium graminearum, the fungus causing wheat scab. Proceedings of the National Academy of Sciences of the USA 97: 7905-7910.
Nexus fileODonnellAll.nex
Notes: six partial gene sequences from Fusarium fungi; NRRL_28338 and NRRL_28721 are recombinants

Name: Bollyky
Source: Bollyky PL, Rambaut A, Harvey PH, Holmes EC (1996) Recombination between sequences of Hepatitis B Virus from different genotypes. Journal of Molecular Evolution 42: 97-102.
Nexus fileBollyky.nex
Notes: complete genome sequences from Hepatitis B viruses; HBVDNA and HPBADWl are recombinants

Name: Starr
Source: Starr JR, Gravel G, Bruneau A, Muasya AM (1996) Phylogenetic implications of a unique 5.8s nrDNA insertion in Cyperaceae. Aliso 23: 84-98.
Nexus fileStarr.nex
Notes: one partial gene sequence from sedge and rush plants; Oxychloe andina is a chimeric sequence

Name: Cooper
Source: Cooper MA, Adam RD, Worobey M, Sterling CR (2007) Population genetics provides evidence for recombination in Giardia. Current Biology 17: 1984-1988.
Nexus fileCooper.nex
Notes: three partial chromosome sequences from Giardia protozoa; Giardia intestinalis isolate 335 is a reassortment

Name: Aoyama
Source: Aoyama J, Nishida M, Tsukamoto K (2001) Molecular phylogeny and evolution of the freshwater eel, genus Anguilla. Molecular Phylogenetics and Evolution 20: 450-459.
Nexus fileAoyama.nex
Notes: one partial gene sequence from Anguilla eels; Anguilla bicolor bicolor is a recombinant

Name: Sessa
Source: Sessa EB, Zimmer EA, Givnish TJ (2012) Unraveling reticulate evolution in North American Dryopteris (Dryopteridaceae). BMC Evolutionary Biology 12: 104.
Nexus fileSessa.nex
Notes: eight partial gene sequences from Dryopteris ferns; Dryopteris celsa EBS27 is a recombinant

(iii) Lateral Gene Transfer

To be added

No comments:

Post a Comment