Wednesday, May 15, 2013

Resistance to network thinking


Phylogeneticists are used to the idea of tree thinking, in which evolutionary history is seen as a branching tree-like pattern. Clearly, for many phylogeneticists this has not yet been extended to network thinking, in which evolutionary history can also be seen as a reticulating network. Indeed, I have recently come across several people who have actively insisted that "trees are still central" to phylogenetics (to quote one of my correspondents). As Mindell (2013) has claimed, the Tree of Life is still a useful metaphor, model and heuristic device.

So, there is not just indifference to networks but there seems also to be some resistance to them. This is somewhat unexpected, as a network simplifies to a tree if there are no incompatible phylogenetic signals, and so there is no intrinsic reason to restrict phylogenies to being tree-like.

As a typical example from the literature, Losos et al. (2012) have recently commented:
Although molecular data have rarely changed our understanding of the major multicellular groups of the evolutionary tree of life, they have suggested changes in the relationships within many groups, such as the evolutionary position of whales in the clade of even-toed ungulates. Further investigation has usually resolved conflicts, often by revealing inadequacies in previous morphological studies. This has led to a presumption by many in favor of molecular data.
Needless to say this is a biased point of view, because conflicts can also be resolved by revealing inadequacies in molecular studies. For example, molecular analyses involve many subjective decisions about substitution models and rates of molecular change, and any one of the underlying assumptions may be violated. There is no theoretical justification for favouring one source of data over another.

Similarly, there is no theoretical justification for trying to resolve conflicts by preferring one hypothesis over another. Phylogenetic conflicts can also be "resolved" by recognizing that evolutionary history is not necessarily tree-like. Losos et al. do not even consider this possibility:
When two phylogenies are fundamentally discordant, at least one data set must be misleading.
In fact, the only misleading thing here is the word "must", because both datasets may be perfectly correct but are simply the product of two different evolutionary histories.

This point is perhaps most obvious when comparing molecular datasets. The evolutionary history revealed by between-gene evolutionary processes (e.g. recombination, hybridization, horizontal gene transfer) often conflicts with that from within-gene processes (e.g. nucleotide substitutions and insertions / deletions), and this leads to a reticulating evolutionary history.

Indeed, the more we learn about genomes the less tree-like does the evolutionary history of species seem to be. There are long-standing controversies regarding the evolutionary history of many taxonomic groups, and it has been hoped that genome-scale data would resolve these controversies. However, to date none of these controversies has been satisfactorily resolved into an unambiguous tree-like genealogical history using genome data. They all apparently involve reticulate evolutionary processes.

For example, the estimated relationships among humans, chimpanzees and gorillas did not change as a result of genome sampling (Galtier and Daubin 2008), nor did those of malaria species (Kuo et al. 2008) nor those of placental superorders (Hallström and Janke 2012). In all three cases the estimated relationships were just as complex after the genome sequencing as before. The resolution of controversial branches in our trees has not occurred as a result of increased access to character data or improved data analyses, but our recognition of reticulating relationships certainly has occurred.

There are many other examples where increased character sampling is yet to resolve long-standing controversies about branching patterns, and where reticulation may also be the true explanation. Birds seem to provide many of these examples (eg. Smith et al. 2013), but insects are a rich source as well (eg. Thomas et al. 2013), and sometimes even plants (eg. Goremykin et al. 2013).

Clearly, when two or more phylogenies are fundamentally discordant, none of the datasets needs to be misleading, because a reticulating history may be involved. Network thinking should thus be a standard tool in the arsenal of every phylogeneticist. Tree thinking excludes networks but network thinking does not exclude trees, and so the more general model will always be the more useful one.

References

Galtier N, Daubin V (2008) Dealing with incongruence in phylogenomic analyses. Philosophical Transactions of the Royal Society of London, Series B, Biological Sciences 363: 4023-4029.

Goremykin VV, Nikiforova SV, Biggs PJ, Zhong B, Delange P, Martin W, Woetzel S, Atherton RA, McLenachan PA, Lockhart PJ (2012) The evolutionary root of flowering plants. Systematic Biology 62: 50-61.

Hallström BM, Janke A (2012) Mammalian evolution may not be strictly bifurcating. Molecular Biology and Evolution 27: 2804-2816.

Kuo C-H, Wares JP, Kissinger JC (2008) The Apicomplexan whole-genome phylogeny: an analysis of incongruence among gene trees. Molecular Biology and Evolution 25: 2689-2698.

Losos JB, Hillis DM, Greene HW (2012) Who speaks with a forked tongue? Science 338: 1428-1429.

Minell DP (2013) The Tree of Life: metaphor, model, and heuristic device. Systematic Biology 62: 479-489.

Smith JV, Braun EL, Kimball RT (2013) Ratite nonmonophyly: independent evidence from 40 novel loci. Systematic Biology 62: 35-49.

Thomas JA, Trueman JW, Rambaut A, Welch JJ (2013) Relaxed phylogenetics and the Palaeoptera problem: resolving deep ancestral splits in the insect phylogeny. Systematic Biology 62: 285-297.

Monday, May 13, 2013

Non-randomness in Forbes' Celebrity 100 ranking


Some time ago I blogged about The mysterious rankings in Forbes' Celebrity 100. I noted at the time that "There are some other things that we can learn from an analysis of the Celebrity 100 list, but they have nothing to do with networks, so I will not cover them here." I will, however, cover them now.

Each year since 1999 Forbes magazine has produced a list called the Celebrity 100, which purports "to list the 100 most powerful celebrities of the year" within the USA. The list is based on entertainment-related earnings plus media visibility (exposure in print, television, radio, and online). The 2012 list generated plenty of negative comments around the web, and my network analysis of the data showed that there is little apparent mathematical logic to some of the rankings.

However, the data do also reveal interesting patterns about the perception of celebrity in the media, provided that we accept the quality of Forbes' data (even if we find fault with what Forbes did with those data). In the graphs below I have simply used the information provided by Forbes in order to take a look at some of the features that Forbes did not comment upon.

The first graph plots the celebrity ranking by sex and "profession" (each dot represents one celebrity). You will note that the data are not randomly distributed among the groups.


The graph shows that one third of the celebrities are female, and they dominate the top 10 and the bottom 30. So, in order to get a high ranking it is best to be female but that after that it becomes a handicap.

The other groupings are based on the Forbes description of each celebrity's principal claim to fame. Clearly, in terms of celebrity status: being a musician is better than being an athlete, which is better than being an actor, which is better than being an actress. Being a TV or radio personality is not bad, either. Note that this explains the bi-modal distribution of females: the music females are in the top 10 while the acting females are in the bottom 30.

For the rest, if you are a male, then being a producer/director is marginally better than being an author, which is marginally better than being a comedian. If you are female, then  being a model is much worse than being a singer or an actress. Being an entrepreneur works only if you are Donald Trump.

The second graph compares each celebrity's money ranking (based on an estimate of their earnings) with their overall ranking. This is an attempt to see who is financially benefitting from their celebrity status (or vice versa). The two lines on the graph show that for most celebrities (those between the lines) their financial status closely follows their celebrity status.


However, for those at the top-left of the graph their celebrity standing is greater than they are being paid. (They are ranked in the top 30 on overall celebrity status but are not in the top 25 money earners.) This means that their manager is "not getting them what they are worth". These people are, from top to bottom on the graph:
Jennifer Aniston
Kim Kardashian
Angelina Jolie
Brad Pitt
Adele Adkins
Beyoncé Knowles
Katy Perry
Jennifer Lopez
Stefani Germanotta (Lady Gaga)
Rihanna Fenty
Justin Bieber
actress
television personality
actress
actor
singer
singer
singer
actress
singer
singer
singer
You will note that there are nine females but only two males in this list. Note, also, the number of singers in the list, indicating that being a singer will get you more celebrity than money.

For those at the bottom-right of the graph their celebrity standing is less than their monetary worth. (They are in the top 25 money earners but are not ranked in the top 25 on overall celebrity status.) This means that their publicity agent is not doing their job (or not being asked to!). These people are, from right to left on the graph:
Mark Burnett
Kenny Chesney
Toby Keith
Jerry Bruckheimer
James Patterson
George Lucas
Michael Bay
Howard Stern
  television producer
  country music singer
  country music singer
  film and television producer
  author
  film director and producer
  film director and producer
  radio personality
These people are all male, so these males have more money than celebrity. Most of these men do not work directly in the public spotlight, or they prefer country music to pop music.

One can perform a similar analysis to compare the celebrities' TV/Radio rank with their Press rank. This produces a very similar graph. It turns out that the people whose TV/Radio rank is poor compared to their Press rank are mostly athletes (David Beckham, Roger Federer, Lionel Messi, Li Na, Cristiano Ronaldo, Maria Sharapova), along with one model (Kate Moss) and one producer/director (Steven Spielberg). The thirteen people whose Press rank is poor compared to their TV/Radio rank are almost all TV/Radio "personalities", as expected.

I am sure that there is more to be found in this dataset, if anyone cares to look.