Monday, March 2, 2020

The phylogenetics of the Last Universal Common Ancestor is hard

If we define phylogenetics as the study of sister-group historical relationships, then it stands to reason that the hardest thing to do in biology would be to study the Last Universal Common Ancestor (LUCA), which is the common ancestor of all known organisms. This is because, by definition, it has no knowable sister group.

Study of the LUCA has therefore mostly been seen as a study of ancestor-descendant relationships, being an attempt to trace the ancestry of living things all the way back until there is nothing more to detect.

This latter approach seems to lead to a lot of arguments. There are arguments about what type of character data to use (it seems doubtful that nucleotide sequences are informative that far back in evolution). There are arguments about how many monophyletic groups there might be of akaryotes, and whether we should consider eukaryotes to be monophyletic, given that they have organelles. For a brief introduction to the use of protein domains for phylogenies, as well as the dispute about the three-domains versus two-domains issue, see this Twitter presentation.

On the other hand, trying to study the LUCA phylogenetically raises some interesting questions, because we are trying to produce a phylogeny with a root but without an outgroup. I recently gave a talk on this subject; and I have included a PDF copy of the slides from that talk here.

The talk starts with some personal history, which just happens to lead into a discussion of what I see as the essential points of phylogenetic analysis. I discuss the essential points of characters versus taxa, emphasizing the role of both character and taxon models. The essential point for the LUCA is the need to determine character polarity, as this gives as the time direction, and allows us to find the earliest time.

Conclusion 1: The characters used to study the LUCA probably need to be molecular, but the form of the character analysis needs to be fundamentally different from what molecular biologists commonly employ — we need to analyze character polarity.

Conclusion 2: We need to think about which characters will have relevant phylogenetic information, for the age depth we are looking at.

Conclusion 3: We need to think about the taxon-change model, as well as the character-change model — the history may be very complex at the root.

Conclusion 4: We study contemporary taxa, and it is inappropriate to try putting ancestors into any modern group, unless you have good evidence that the ancestor is the MRCA of that group (ie. the group is monophyletic).

For the study of the phylogeny of the LUCA:
  • The root cannot be added to an unrooted line graph, but instead the root must be a direct product of the data analysis
  • Sequence data are unlikely to be informative, because the required character-change models matter too much at that time depth
  • The evolutionary history may be much more complex than can be represented by a tree, and may be impractical even for any current form of network analysis
  • The LUCA is not part of any extant phylogenetic group.

No comments:

Post a Comment