Monday, March 24, 2014

Trees, treemaps and networks

Hierarchically arranged information has traditionally been represented as a tree. However, this is not the only way that this information can be pictured. As noted by Manuel Lima (Visualization Metaphors: Old & New):
As one of the most hailed methods of modern information visualization, the treemap has truly become an epitome of the recent growth of the field and one of the most widespread methods for visualizing hierarchies.
Isabel Meirelles (Design for Information: An Introduction to the Histories, Theories, and Best Practices Behind Effective Information Visualizations. Rockport Publishers, 2013) provides this illustration as an example of the different ways to represent hierarchies:

So, treemaps display the tree information as a set of nested rectangles — each branch of the tree is given a rectangle, which is then tiled with smaller rectangles representing sub-branches. The main advantage of using a map as a representation is that the size and colour of the rectangles can be used to represent other information about each tree leaf. (Note: This treemap concept should not be confused with Mike Charleston's program TreeMap, which maps the relationships between two phylogenetic trees, nor with MLTreemap, which maps an unidentified DNA sequence onto a phylogenetic tree.)

Modern treemaps were developed in 1991 by Ben Shneiderman, who has conveniently provided a description of the history and initial development of the idea (Treemaps for space-constrained visualization of hierarchies). Not unexpectedly, this idea has been adopted in biology. For example, taxonomic hierarchies are sometimes represented using a treemap, such as in BioNames (which displays the taxonomic groups recognised by the Index to Organism Names database), and the Natural Science Museum of Barcelona (which allows interactive access to the database records via a taxonomic hierarchy). It has also been used to display the gene ontology associated with gene expression data from microarray studies (Visualization and analysis of microarray and gene ontology data with treemaps).

In addition, it has been suggested that treemaps could be used to represent phylogenetic trees (Using treemaps to visualize phylogenetic trees. 6th International Symposium on Biological and Medical Data Analysis, 2005. Lecture Notes in Computer Science 3745: 283-293); and there is an associated computer program. An example is shown below, in which the rectangles are coloured by their taxonomy — the circles highlight two sequences that are misplaced in the tree (ie. their tree location does not match their taxonomy).

This approach to displaying phylogenies has not really caught on (ie. phylogeneticists have stuck to the "node-link" layout). The treemap approach works best with a fixed-level hierarchy, such as the taxonomic hierarchy or the gene ontology hierarchy. In phylogenetics, on the other hand, branch lengths are variable, so that there is no fixed-level hierarchy. Treemaps work well for displaying information about groups that might be recognized in the tree, but not for the tree itself.

Nevertheless, similar methods were suggested long before the invention of computers (two early examples are noted by Manuel Lima, in the blog post linked above). Indeed, we end up with a treemap if we simply cut slices out of the tree, as shown by the next picture (taken from Isabel Meirelles' book), which shows Maximilian Fürbringer's tree of bird relationships from 1888 (published in Untersuchungen zur Morphologie und Systematik der Vögel). On the left is the side view of the tree, and on the right are three slices through the tree branches (as viewed from above). This produces a circular treemap rather than a rectangular one, which is admittedly a less efficient use of the visualization space.

Finally, we can consider the relationship of these ideas to phylogenetic networks. A network is not a nested hierarchy, but instead involves a collection of over-lapping sets. This can be represented as a venn diagram, for example, but not as a treemap. This form of visualization has also been a long-standing suggestion in phylogenetics. The final picture shows Georg August Goldfuss' "system of animals" from 1817 (published in Ueber de Entwicklungstufen). It is a set of nested egg-shaped sets, expressing his ideas about affinity relationships, with one set over-lapping several of the others, representing a non-nested series of relationships. There is nothing new under the sun!

No comments:

Post a Comment