Wednesday, September 2, 2015

Is this a "gold standard" dataset?

I have just added another dataset to our database. This one is of considerable interest, because it is a complex one. As the authors note, it is likely to contain ancient hybrid speciation, recent introgression and deep coalescence. Thus, identifying recent hybrids will be problematic.
Michael L. Moody and Loren H. Rieseberg (2012) Sorting through the chaff, nDNA gene trees for phylogenetic inference and hybrid identification of annual sunflowers (Helianthus sect. Helianthus). Molecular Phylogenetics and Evolution 64: 145–155.
There are 29 accessions from 13 species, with data for 11 loci in 5 linkage groups (a total of 8,077 aligned nucleotides). The accessions have sequences for either 1 or 2 of the alleles, and sometimes 3 (the latter are likely to be the result of PCR artifacts). The authors have also tried to identify recombinant sequences. Three of the species are previously identified hybrid taxa.

Unfortunately, adding this dataset to the database has also been problematic, because there are internal inconsistencies. For complete consistency, Figure 1 of the paper should agree with its own Table 1, and the GenBank data should agree with both of them. Unfortunately, this three-way consistency exists for only 2 of the 11 loci. For the rest, in 7 instances the dataset is the odd one out, in 4 cases it is the table, and in four instances it is the figure. For the data discrepancies, in 2 cases a sequence is missing, in 1 case there is an extra sequence, and for the remaining 2 pairs it is likely that there is mis-labelling of the sequences.

It is therefore not immediately obvious to what extent this counts as a "gold standard" dataset. I have included it because of its intrinsic interest, but obviously with a caveat emptor warning. Sadly, this sort of situation has been all too common in my search for suitable datasets.

Monday, August 31, 2015

The solution to the spinach fallacy?

Last week I blogged about Spinach and the iron fallacy. I analysed an early set of data by Thomas Richardson (1848), who calculated the amount of iron in combusted ash for various vegetables and fruits, and showed that spinach is not at all unusual in its constituents. The idea that spinach is rich in iron is untrue, and the story about a mis-placed decimal point seems to be nothing more than an urban myth.

In the meantime, Joachim Dagg, at the Natural History Apostilles blog, has reanalysed Richardson's data and revealed that The first source for the spinach-iron myth is likely to have been a somewhat inappropriate attempt to combine his data for the percent iron values in relation to the ash with the percent values of the ashes in in relation to the fresh matter.

So, I have recalculated the phylogenetic network using these "adjusted" values. I used the percent values of the chemical constituents in relation to the pure ash (raw ash minus carbonic acid, charcoal and sand), and combined them with the percent values of the ashes. The issue here is that radish roots and leaves have the largest ash values, followed by cherry stems and spinach. This leads to an over-statement of the chemical contents. In particular, the iron content moves spinach from being ranked sixth to second (behind radish foliage, which is not usually eaten).