Pages
▼
Monday, March 30, 2015
Inconsequential splits in NeighborNet graphs
NeighborNet produces splits graphs based on distances between the taxa, rather than using the original character data. This approach can produce what we might call inconsequential splits in the graph — that is, splits that are not explicitly supported by the character data. Here, I present a simple example to illustrate the extent to which this can occur.
The data are taken from: Nanette Thomas, Jeremy J. Bruhl, Andrew Ford, Peter H. Weston (2014) Molecular dating of Winteraceae reveals a complex biogeographical history involving both ancient Gondwanan vicariance and long-distance dispersal. Journal of Biogeography 41: 894-904.
This dataset consists of a set of eight morphological features of the pollen from 31 extant plant taxa plus two fossil samples, as shown in this data matrix:
12345678
T_lanceolata 00111011
T_stipitata 00111011
T_purpurescens 00111011
T_xerophila_x 00111011
T_xerophila_r 00111011
T_vickeriana 00111011
T_glaucifolia 00111011
T_membranea 00111011
T_insipida 00111011
--------
T_perrieri 00111010
D_winteri 00111010
D_grenadensis 00111010
--------
B_comptonii 00011010
B_howeana 00011010
B_semicarpoides 00011010
B_whiteana 00011010
B_queenslandiana_q 00011010
B_queenslandiana_1 00011010
--------
P_axillaris 00011011
P_colorata 00011011
Pseudowinterapollis 00011011
--------
B_pancheri 01001011
--------
Harrisipollenites 01001100
--------
Z_acsmithii 01001101
E_stipitatum 01001101
Z_bicolor 01001101
--------
Z_balansae 11001101
--------
C_dinisii 1-111101
C_madagascariensis 1-111101
W_salutaris 1-111101
P_macranthum 1-111101
C_ekmanii 1-111101
C_winterana 1-111101
Note that there are only nine groups of taxa (separated by the dashed lines) — within each group the data are identical. Each character has two states: present / absent.
The resulting NeighborNet, as produced by default using the SplitsTree4 program, is shown in the first graph.
As expected, the taxa form nine groups. There are a number of apparently well-supported splits (ie. with long edges) separating these groups. There are also a number of smaller splits, and a whole series of very tiny splits. None of these latter two groupings are explicitly present in the dataset — the only splits supported by the characters are plotted onto the graph using the character numbers. (Note that character 5 is uninformative.)
The series of very tiny splits are present throughout the graph as extremely short edges. For example, a detailed view of the bottom left-hand corner of the graph is shown in the next figure.
Note that these six taxa have identical character data, and therefore their separation into four groups is entirely an artifact of the NeighborNet algorithm.
So, one needs to be careful when interpreting small splits in such a graph — they may have biologiocal support and they may not.
No comments:
Post a Comment