The Genealogical World of Phylogenetic Networks: Biological versus phylogenetic networks

Networks have recently begun to receive serious attention in nearly all areas of biology. There has been a new focus on complex networks embedded within biological systems; and the mathematical properties of those networks are now being actively studied. In this sense, the interest in phylogenetic networks is simply part of a much larger movement.

An important point, however, is whether the characteristics of the different biological networks have anything in common. The nodes, for example, can represent units at all levels of the biological hierarchy, from elements, through organic and inorganic compounds, to tissues, organs, individuals, populations, species, communities and ecosystems. The edges (or arcs) represent all sorts of interactions between the nodes, including transcriptional control and other biochemical processes, energy and nutrient flow, behavioral interactions, and genetic or genealogical relationships.

Does this complexity mean that we have networks of fundamentally different type, or do the networks differ only in a few mathematical details? Importantly for our purposes, are phylogenetic networks essentially different from other biological networks? If so, then developments elsewhere do not necessarily flow on to us. Indeed, phylogenetic networks seem to be unknown to many network biologists. For example, phylogenetics is not even mentioned in this review paper, which implies some sort of disconnection: Proulx, Promislow, Phillips (2005) Network thinking in ecology and evolution. Trends in Ecology & Evolution 20: 345-353.

I will argue here that, indeed, phylogenetic networks do not match any other type of biological network.

Network Characteristics

First, we can list some of the important characteristics of phylogenetic networks if they are to represent evolutionary history, and then consider them individually:

fully connected
directed
single root
each edge (arc) has a single direction
no directed cycles
in species networks the internal nodes are usually unlabelled, although in population networks some / many of them may be labelled.

Most other biological networks can be disconnected, at least potentially, because the definition of the nodes to be included in the network is often independent of the network itself, so that there is no necessary connection between nodes. For example, the species within a local community may not all be connected to each other with respect to the characteristic being studied (eg. genetic relatedness). Indeed, finding this out may be a primary goal of any particular study. Similarly, molecular compounds usually form at least semi-independent sets of pathways, so that the study of any one organ can produce disconnected networks. With evolutionary history, on the other hand, all conceivable nodes are connected to each other by definition (unless there are multiple origins and subsequent history of life in the Universe).

Protein interaction network

In order to represent history, which has a single time direction, a phylogenetic network must have directed edges (arcs) to represent the time course. Many other biological networks have no explicit direction, even if there is an implied one. For example, in protein-protein interaction networks the edges represent the presence of physical interactions between proteins (with no implied direction), and in genetic-relationship networks the edges simply represent the degree of genetic relatedness of individuals (eg. the link between siblings has no explicit direction, although there is an implied directional link to their parents).

In a phylogeny there is usually a single root, because phylogeneticists try to work on monophyletic groups (clades); and if they really do want to study the Tree of Life then there is assumed to be a single origin of life in the Universe. Once again, for other networks the definition of the included nodes is often independent of the network or its shape, so that a single root is not necessary. For example, networks of regulatory interactions among genes are often represented with the nodes around the perimeter of a circle with the edges being chords. Furthermore, in food webs the arcs represents who eats whom, and these networks are called "webs" for a good reason: there is usually no obvious root position. Indeed, the usual representation of a food pyramid starts with multiple sources (at the bottom) and a single sink (at the top), with the arc directions indicating "is eaten by".

Gene regulatory network

Also, many biological networks have directed cycles. For example, the feedback loops in biochemical pathways are usually important (as sometimes are feedforward loops). Indeed, the discovery of feedback has been considered to be a major contribution to our understanding of why biological systems are different from non-biological ones. The recycling of nutrients in ecosystem nutrient pathways is another prominent example, although no feedback is involved in this case. Once again, the recognition that the Earth is effectively a closed system with finite resources that must be reused is considered to be a major contribution by biology.

Moving on, many networks have bidirectional arcs, indicating direct interactions between nodes. Indeed, many behavioral systems show this feature, including intra- and inter-competition networks in ecology as well as sexual-contact networks (which, incidentally, have two distinct types of nodes). Immunological networks often have this characteristic, as well, with the arcs pointing in one direction or the other at different time points during a cell's immunological reaction to a stimulus. (These networks also can have nodes with arcs that point directly back to themselves, indicating that a molecule regulates itself.) Host-parasite systems can also be considered to have bidirectional arcs, although in this case the paired arcs represent different processes (the effect of the parasite on the host and the host on the parasite operate via different mechanisms). In this case, two separate arcs are usually used, rather than a single bidirectional one, thus representing a directed cycle.

Predator-prey systems may, on occasion, match phylogenetic networks. If we isolate the predator-prey relationships from all of the others in a food web then a single tree-like structure sometimes emerges, with a single "key" predator at the root and a series of non-predators at the leaves. However, more often there are several "root" predators within any one community predator-prey network. Similarly, disease-transmission networks can be tree-like if there is a single identifiable origin to an epidemic, for example, but not otherwise. Note that the internal nodes are all labelled in both of these types of network, so that they will match a population network rather than a species network.

HIV partner network

Conclusion

Almost all types of biological networks are built by starting with a labelled set of nodes and then directly linking those nodes with edges — phylogenetic networks seem to be the only major class of biological networks in which some or many extra nodes are inferred by the network-building process. That is, almost all other networks are built empirically, by using a collection of observed nodes and connecting them via observed edges ("observed" indicating that there are experimental data). Phylogenetic networks, on the other hand, attempt to reconstruct unobserved (and unobservable) historical relationships using data, a model and a mathematical optimization procedure.

So, I have been unable to think of any other biological networks that do match all of the important characteristics of a species network. Perhaps some of you may be able to come up with a good example?

Update: This later post considers the summaries used for biological networks and whether they apply to phylogenetic networks.