Tuesday, March 13, 2012

Network measures and phylogenetic networks

Recently, I considered the relationships between phylogenetic networks and other types of biological network. I concluded that they may be quite different. This further suggests, that much of the theoretical work being directed towards the study of those networks ("network science"; eg. Newman 2010) may not turn out to be particularly relevant for phylogenetic networks, at least from the biological perspective. However, that does not mean that we should not look further into the idea.

One major aprt of the study of other biological networks has been the development of descriptive summaries of the network charactertistics. These characteristics are usually summarized by one or more mathematical measurements. This does not necessarily mean that biologists have seen any close relationship between these mathematical measures and biologically relevant quantities, but they are working on it.

So, it is worth considering whether any of these network measures have yet played a role in phylogenetic networks.

Network Measures

Properties of individual nodes

Node degree — number of incident edges to a node

  • for a dichotomous tree this is pre-defined (indegree 1, outdegree 2), and many network models have similar restrictions (eg. indegree 2, outdegree 1 for reticulation nodes)
  • however, applying the coalescent to a population network suggests that the node with the largest degree is the most probable common ancestor, so it is potentially of interest here

Degree distribution — frequency distribution of the degree for all nodes

  • not used so far, presumably because it would be uninteresting in light of the previous comment

Properties affected by local subgraphs of the network

Clustering coefficient — the degree to which nodes cluster together, measured as the density of triangles in the network (can also be a global measure)

  • not used so far

Distribution of network motifs — motifs are connectivity-patterns that occur more often than expected, usually expressed as a frequency distribution

  • not used so far

Properties affected by the whole network

Closeness — inverse of the summed shortest pathlengths to all other nodes, often averaged across all nodes

  • not used so far

Betweenness — number of inter-node shortest paths on which a node lies, often averaged across all nodes

  • not used so far

Node density — number of nodes per unit pathlength

  • not used formally, as far as I know, but phylogeneticists have consistently (and perhaps inappropriately) distinguished highly branched (speciose) parts of a tree from unbranched parts

Centrality — can be measured with respect to degree, closeness or betweenness

  • not used so far

Network diameter — either the average minimum distance between pairs of nodes, or the longest pathlength between any pair of nodes (relative to the number of nodes)

  • has sometimes made its appearance as a statistic in the phylogenetic literature
  • has been used as an optimality criterion for distance-based tree-building
  • if nothing else, the maximum diameter is used for mid-point rooting of a tree

Nestedness — quantifies whether the structure of small assemblages is a proper subset of the structure of large assemblages

  • a dichotomous tree is fully nested, and so nestedness has had a leading role in phylogenetics
  • nestedness could be used to measure the tree-likeness of a network

Fractal structure — quantifies the similarity of network structure at different scales

  • not used so far, although tree-imbalance (inversely related to fractal structure) has been an important measurement for trees

Network resolution — amount of information contained in the network (i.e. how much of the variation in node and edge behaviour is retained in the network representation) e.g. unrooted < rooted < rooted with variable edgelengths

  • of interest but usually not quantified
  • an unrooted tree/network cannot represent evolutionary history
  • use of variable edgelengths is common for rooted trees but not so far for rooted networks
  • variable edgelengths are used in unrooted networks


So, most of these measures have not yet played a significant part in the development of phylogenetics. Instead, phylogeneticists have concentrated on quantifying the fit of their data to the trees, such as the consistency index, retention index or permutation tests (for parsimony), likelihood scores (for ML) and posterior probabilities (for bayesian), or they have considered "support" for individual edges, via procedures such as the bootstrap, various parametric statistical measurements, and the posterior probability of clades.

This distinction between phylogenetics and biological networks seems, once again, to come from the different way that the networks are constructed. The other networks are usually constructed directly from observed objects and interactions, so that interest focuses on a description of the resulting network. Phylogenetic networks, on the other hand, are inferred via optimization of the data and a model, so that interest focuses on the quality of the inference rather than on a description of the network.

It seems likely, therefore, that this situation will continue, as most of these measures are specifically designed for describing empirically observed networks. However, the somewhat more nebulous concept of "network robustness" (the degree to which a network structure is affected by removal or alteration of nodes) has been seen as an important characteristic in the study of all biological networks.

As noted by Proulx et al. 2005: "The hope is that network approaches will ... reveal the global patterns behind large-scale ecological and evolutionary processes. The fear is that all of the fine structure will still matter in the end, leaving us tangled in detail."


Newman M.E.J. (2010) Networks: An Introduction. Oxford University Press, Oxford.

Proulx S.R., Promislow D.E.L., Phillips P.C. (2005) Network thinking in ecology and evolution. Trends in Ecology & Evolution 20: 345-353.

No comments:

Post a Comment