The Genealogical World of Phylogenetic Networks: A new SARS-CoV-2 variant?

In previous blog posts, Guido has examined the phylogenetic patterns in the current SARS-CoV-2 outbreak, responsible for the socially disruptive Covid-19 pandemic:

These patterns are traceable because, being a virus, there is a high mutation rate in the genome, and many genomes have been sequenced. Even on the Diamond Princess boat, it is clear that a number of genetic variants arose during its few weeks of quarantine.

Guido analyzed in detail some of these known variants, and their associated genome mutations. He carefully tried to distinguish possible sequencing artifacts from genuine mutations, and which of the latter seem to be the result of genomic recombination among different strains. Naturally, he did this in the context of using phylogenetic networks as the preferred tool of analysis.

Needless to say, Guido is not the only person to have tried this sort of analysis, although people do not really seem to have grasped that recombination as a molecular process requires the concept of a phylogenetic network. There is an intellectual fixation with phylogenetic trees rather than networks. The tree approach is to detect incompatibilities among the trees, and to deduce recombination as the cause. However, why demonstrate that your preferred analysis method fails, and reach a conclusion from this, when you could simply analyze the data appropriately in the first place?

One recent pre-print that has attracted a lot of attention, based on looking for genetic mutations in a single gene, and then using a tree-based analysis, is:
Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2

The attention-getting part of the paper is that a particular mutation variant of the virus seems to be getting more common among hosts, and in some places has become the dominant strain. The authors conclude that the mutation has been positively selected due to greater infectivity. This is potentially important because the gene being studied is the Spike (or S) protein, which creates the distinctive crown-like appearance of the virus itself. This crown mediates infection of host cells, and is thus the target of most vaccine strategies and antibody-based therapies. Clearly, then, this variant might be of great practical interest.

However, while the press coverage has been enthusiastic, most of the professional commentary so far has been unimpressed with the authors' conclusions. Basically, the reaction to the authors has been "not so fast, guys". The evidence is suggestive at best, and not yet verified (see We don’t know yet whether a mutation has made SARS-CoV-2 more infectious).

Comments

My points in this blog post are about the analyses. There are two parts to the analyses: the identification of mutations and selection, and the study of recombination.

First, only one mutation has been identified, which appears to increase in prevalence through time. So, the conclusion that the new variant is more virulent seems to be based on the idea that it becomes the dominant strain in any population. If this is so, then we still have only one main variant to deal with, in terms of medical response. Indeed, if this variant has been around since February, as the report claims, then most infected people must have it. The only people who wouldn't have this one would be the very earliest cases.

Moreover, if a mutation is positively selected, then it must be difficult to distinguish reticulation from convergence. If variants that gain a mutation via reticulation become dominant, then with every generation we increase the probability that the same mutation will be independently obtained by another virus lineage. Being positively selected, these independent mutations will quickly be dispersed. Given that the virus has been around now for nearly 5 months, with a steadily increasing and diversifying available-host population, there would be plenty of time for convergent evolution of the same beneficial mutation.

Second, phylogenetic trees are often used to try to study the origin of genetic variation, especially if there has been recurrent emergence of particular variants, each of which has subsequently diverged independently. This was Charles Darwin's idea when he talked about the tree as a model for evolution. However, Darwin's book also has a long chapter on hybridization, which cannot easily be studied using the tree model. This apparent contradiction did not concern Darwin, because his book is mostly about the continuity of evolutionary history, which was his main motivation for using the tree model. Hybridization is evidence for continuity, even though the tree model is too simple for studying it. The same argument applies to the study of introgression.

It is the same for processes like recombination, which is conceptually no different, although it occurs at the molecular level, instead. As far as the new paper is concerned, its Figure 1, which is a couple of phylogenetic trees, does not fit well with Figure 6, which is a set of alignments illustrating recombination. Why authors cannot see contradictions between different parts of their own work remains a mystery.

As a final note, the authors raise the specter of re-infection by the new SARS-CoV-2 variant. However, it is our developed immunity (ie. production of antibodies) that protects us, epidemiologically. To allow re-infection, the virus would need to avoid these antibodies. Being more infectious does not automatically make a virus able to avoid antibodies. Nevertheless, I would not be surprised if we learn that some people become ill more than once. (NB. This is different from saying that people have multiple strains. Multiple infections do not necessarily result in multiple illnesses, because of the antibodies.) A bigger concern for new illnesses is likely to be the observed large variation in the amount of antibodies that people produce (more is better, of course).

Monday, May 11, 2020

A new SARS-CoV-2 variant?

No comments:

Post a Comment