Wednesday, August 22, 2018
Distinguishability in Phylogenetic Networks, report
We have now completed the workshop, as you can tell from the previous post with some photos. Here is a brief report on what seem to me to be some of the more useful points covered.
We had 10 formal presentations, but we also focused on group discussions for several hours each day. It may be the latter that were the most productive. However, I will briefly summarize the talks first.
I spent my time time in the opening talk emphasizing the different viewpoints of network computations, which focus on the patterns that can be detected in the data, and the network users, who are generally more interested in the processes that create those patterns (or are, indeed, absence from the patterns but present in the phylogenetic history, anyway). This highlights the two essential point of the workshop title, that both the patterns and the processes are much harder to untangle for networks than for trees.
Céline Scornavacca then bravely tried to tackle the combined problem, anyway, by trying to produce networks from analyzing the patterns in terms of their processes. The issues immediately become obvious, but she seems to be determined to proceed, regardless. Later in the week, Luay Nakhleh reduced the issue simply to vertical processes (including incomplete lineage sorting but not gene duplication-loss) versus horizontal processes. This creates a tractable problem for parsimony and likelihood, but the current challenge remains the limited number of taxa.
Vincent Moulton, Cécile Ané and Charles Semple dodged the issue by focusing on computations. Charles took on the challenge of trying to create a network version of Neighbor-Joining, which would address the issues of computational speed and taxon sampling, and Vince tackled super-networks, and the conditions required for building networks from a collection of smaller (ie. incomplete) trees. Both topics remain open questions. Cécile, on the other hand, discussed network models for trait evolution, which is important for the use of phylogenetic comparative methods when using networks.
On the user side, the presentations focused on examples, and the issues encountered when dealing with them. James Whitfield and Axel Janke talking about biology (mostly phylogenomics), while Johann-Mattis List talked about linguistics, and Tiago Tresoldi talked about stemmatology. In some ways, historical linguistics seems to be the odd one out, since many of the issues dealt with are somewhat removed from those in the other fields. However, in biology there are actually two options for producing networks — directly from the data or via "gene trees" (trees derived from non-recombining blocks of sequences). For the humanities, much of the current discussion is about the nature of the data, and how to code it for quantitative analysis.
This brings us to the discussions. While some time was spent on trying to establish whether biologists think that there is a difference between lateral gene transfer and horizontal gene transfer, or between incomplete lineage sorting, ancestral polymorphism and deep coalescence, some productive interchanges also occurred. Here is a coverage of four of the most important ones.
There was general agreement that there are several barriers to widespread adoption of network analyses in phylogenetics. This includes the development of suitable methods (in the face on indistinguishability), but also includes an understanding of what methods are currently available, what data are required to apply those methods, what taxon sampling is required to benefit from the methods, and how to use the programs that implement those methods.
One popular suggestion was therefore to produce some sort of "cookbook", to address the complexity of producing networks, given that there are many methods and programs. From the users' point of view this would illustrate what network analyses can do, in terms of finding reticulation patterns in the data; and from the computational point of view it would outline what needs to be done to get the programs to work. The consensus idea was to choose two suitable datasets (yet to be determined), and then have each program author provide analyses of them (including any scripts that are needed).
Following on from this latter point, it was agreed that the programs need easy user interfaces, if they are to become more widely used. Here, the word "widely" includes casual users from outside of phylogenetics, who use phylogenies as only one of many tools in their work. So, users will include those who need nothing more than a "point and click" control panel (which may be >90% of potential users) to those who would benefit from scripting control of the analyses. The interface needs both a front end, to specify the particular analysis, and a back end, to allow exploration of the output.
Another long-discussed issue was how to popularize networks, which is clearly a major topic. A phylogenetic tree is nothing more than one of the possible networks for any given dataset, and yet the focus is often on trees rather than networks.
To this end, it was noted that the current Wikipedia entry is inadequate, especially compared to the corresponding entry for phylogenetic trees. Not only is this entry out of date, it is in a number of ways misleading. In particular, there needs to be a discussion of the fact that, if a network is a "tree with reticulations", then ignoring the reticulations can result in the wrong tree, and the branch lengths may be severely under-estimated. There are challenges to getting Wikipedia entries changed, especially the wholesale re-writing of an entry, but this will be necessary.
Finally, it was noted that Philippe Gambette's Who is Who in Phylogenetic Networks website is extremely useful but is still poorly known, even within the phylogenetic networks community. We had a long discussion about how to enhance this site, to make it a more general-purpose repository of information about phylogenetic networks. This included a more inclusive database, more comprehensive tagging of keywords, enhanced descriptions of those keywords, and ways to keep the database up to date.
Steven Kelk has the notes from the final session, which was a review of what we achieved during the workshop, and which contains the To Do list. Both he and Philippe have the notes about modifications for the Who is Who in Phylogenetic Networks website, which is likely to be the first outcome-project tackled.
Thankyou to everybody who participated in the workshop. It seemed to be very productive, with a number of concrete outcomes that will be interesting to review at the next workshop.