During one of the discussion sessions at the recent Phylogenetic Network Workshop, in Singapore, the need was re-iterated for "gold standard" empirical datasets, in order to aid the development and validation of algorithms for phylogenetic networks.
The current collection of such datasets is located on this blog, at:
http://phylonetworks.blogspot.se/p/datasets.htmlHowever, it is still quite a small database, as so far it has been based solely on my own ability to locate suitable datasets that are freely available (see the comments in Public availability of phylogenetic data).
I would therefore like to remind everyone that if you have, or know of, suitable empirical datasets then please contact me.
The database is currently hierarchically arranged as follows:
Datasets where the history is a tree
Datasets where the history is known from experimentation
Datasets where the history is known from retrospective observation
Datasets where the history is reticulated
Datasets where the history is known from experimentation
Hybridization
Contamination
Datasets where the reticulation is inferred
Hybridization
Recombination
Lateral Gene Transfer
The basic requirement for a "gold standard" dataset that contains one or more reticulations (ie. there is gene flow) is that the evidence for the reticulation(s) is independent of the particular dataset. That is, there should be either experimental data, or at least another independent dataset, confirming the gene flow. This is quite a tough criterion, particularly for lateral gene transfer, but it is a necessary quality criterion.
Finally, the database requires the processed data (eg. a multiple sequence alignment), rather than the original raw data (see the comments in Releasing phylogenetic data).
No comments:
Post a Comment