Monday, September 21, 2020

Herd immunity and the end of Covid-19

Following on from my previous posts about the SARS-CoV-2 virus, and Covid-19, the human disease that it causes, there are a number of miscellaneous topics that could also be discussed. Unfortunately, this is only a part of the post that I originally intended. I had written about some aspects of the pandemic that seem to be less well known. However, Blogger deleted the draft without warning, and this is the only part that I could recover.

Here, I talk about how the pandemic ends, as far as biology (rather than society) is concerned.
There is a lot of wishful thinking at the moment, that production of a vaccine will see the end of the pandemic, but the World Health Organization has warned that this may not be so. For example, they are apparently trying to develop a 5-year strategy for Europe, not a 5-month one. One of their officials, Hans Henri Kluge, has noted: "The end of the pandemic is the moment when we as a society learn how we can live with the pandemic."

Biologically, safety from pathogens involves what is called herd immunity. This refers to the proportion of the population who are not infectious, and thus are not spreading the pathogen (whether it is a virus, a bacterium, an apicomplexan, or a fungus). Lack of infectiousness can be achieved by:
  1. being resistant to the pathogen in the first place, perhaps due to past immunological events (eg. Coronavirus: How the common cold might protect you from COVID)
  2. becoming infected and then recovering, by producing antibodies or T-cells (eg. This trawler’s haul: Evidence that antibodies block the coronavirus)
  3. being vaccinated, which produces the same immune response as 2., by producing protective antibodies.

Note that 2. is not necessarily dangerous for most people, as reports show that anything up to half of the people who have antibodies to SARS-CoV-2 did not report clinical symptoms, or only mild symptoms. [Note also: lack of symptoms does not mean that you are not infectious.] However, the variation in human response has clearly been huge (see From ‘brain fog’ to heart damage, COVID-19’s lingering problems alarm scientists), in many cases resulting in cytokine storms, and death.

The main risk factors are also clear — age and gender (The coronavirus is most deadly if you are older and male — new data reveal the risks), and any pre-existing medical conditions, notably obesity (Individuals with obesity and COVID‐19: a global perspective on the epidemiology and biological relationships). Furthermore, we do not yet know how long any immune protection lasts — for example, we now have people who have been infected more than once (Researchers document first case of virus reinfection), although most have kept their antibodies for at least 4 months (Fyra av fem behåller antikroppar mot nya coronaviruset).

Nor do we yet know about the success or danger of 3., because it normally takes a couple of years of clinical trials before a vaccine is approved for use, and even then we can get it badly wrong (cf. the originally undetected side-effects of thalidomide). As far as health care is concerned, responsibility for treatment of any unfortunate outcomes from immunization is not at all clear. Furthermore, those nations that spend the most on healthcare per person may not be ranked highest for health outcomes and quality of care (see: What country spends the most on healthcare?). Therefore, it is hardly surprising that many people are concerned about taking any new vaccine (A Covid-19 vaccine problem: people who are afraid to get one), and that the World Health Organization is being much more cautious than many government leaders (Most people likely won't get a coronavirus vaccine until the middle of 2021).

Nevertheless, once herd immunity is achieved in my local population, I am relatively safe, irrespective of whether I have been vaccinated or not — there will be few infectious people around me, and so I am not very likely to catch the pathogen. Personally, I could wait a while to see how the myriad new vaccines affect people, as they have been rush-produced in a way that would not normally be accepted as safe for public use (what is called the Phase 3 trial takes time). After all, there seems to be an awful lot of politics involved, especially in the USA (The 943-dimensional chess of a trustworthy Covid-19 vaccine).

Some calculations

The point here is that the development of any epidemic is an interaction between infectivity, herd immunity and infection control. Let's consider some explicit numbers to make this clear (based on: Flockimmunitet på lägre nivå kan hejda smittan).

Infectivity refers to how the pathogen spreads among the at-risk population, usually described as the basal reproductive rate (R0). If each infected individual infects 2-3 others, then the R0 value is c. 2.5 (each person infects 2.5 other people, on average). This means that the epidemic must spread — if R = 1 then there is no spread; and if R < 1 then the infection slowly dies out (it stops instantly if R = 0).

Clearly, infectivity can be reduced by any infection control measure that reduces R. Some of these were listed in the previous section. These measures can easily reduce the initial R0 by one half, meaning that the epidemic spreads much more slowly, if R = 1.25.

Herd immunity comes into this by also reducing R. For example, if herd immunity reaches 60%, then only the remaining 40% of the people are susceptible to the infection. If we combine this 40% with the initial R0 = 2.5, then R = 1, and the epidemic no longer increases. That is, we now have it under control. Moreover, if we have managed to get to R = 1.25, then a herd immunity of even 20% will cause the epidemic to decrease.

Bhoj Raj Singh has a good slide presentation elaborating on this topic.

These calculations interact with the concept of relative risk, of course. The calculations so far assume that infection exposure is random in society, which is obviously too simple an idea. Some people are more socially active than others, are thus likely to be more exposed, and they will then quickly achieve significant herd immunity. Others find it difficult to self-isolate because of their work or social conditions, which also increases the development of herd immunity. All of this also helps more isolated people, of course, because they are not at risk of infection from those active groups with herd immunity.

We would thus expect herd immunity to develop first in cities (eg. Experts say Stockholm is close to achieving herd immunity ; A third of people tested in Bronx have coronavirus antibodies) and in poor communities (Herd immunity may be developing in Mumbai’s poorest areas), both of which seem to be the case for SARS-CoV-2.

Equally importantly, herd immunity cannot develop if we all hide from the virus. This has happened in New Zealand, for example, which has so far successfully quarantined itself from the rest of the world — they have not successfully fought the virus, they have instead successfully hidden from it. The issue is that the populace can never come out of hiding, and can thus never let anyone come into the country, not even returning New Zealanders. As an example, Hawaii had the same isolation advantage, and then lost it, just as expected (Hawaii is no longer safe from Covid-19), as also did Australia (Coronavirus (COVID-19) current situation and case numbers).

It is a classic question: which is better, fight or flight? In a pandemic, flight cannot lead to herd immunity, which is what we need in order to "learn how we can live with the pandemic".

So, where are we now? Well, a recent poll in the USA suggests that it is an even split about whether people will actually take a vaccine if offered soon (U.S. public now divided over whether to get Covid-19 vaccine). Will 50% be enough to ensure herd immunity in that country?

Monday, September 14, 2020

Exploring the oak phylogeny

Neighbor-nets are a most versatile tools for exploratory data analysis, including phylogenetics. They are not only fast to infer, but possibly most straightforward in depicting the signal in one's data matrix — this is called Exploratory Data Analysis. EDA makes them useful additions to any phylogenetic paper, because it gives the reader (and peers and editors during review) a good idea what the data can possibly show, and where there may be problems.

A nice example of this use is the Neighbor-net in a recent paper on Chinese oaks:
Yang J, Guo Y-F, Chen X-D, Zhang X, Ju M-M, Bai G-Q, Liu Z-L, Zhao G-F. Framework Phylogeny, Evolution and Complex Diversification of Chinese Oaks. Plants 2020: 1024.
[Note: The paper is, from a purely methodological point-of-view, pretty well done, but has probably not experienced any real peer-review.**]
Oaks (Quercus L.) are ideal models to assess patterns of plant diversity. We integrated the sequence data of five chloroplast and two nuclear loci from 50 Chinese oaks to explore the phylogenetic framework, evolution and diversification patterns of the Chinese oak’s lineage. The framework phylogeny strongly supports two subgenera Quercus and Cerris comprising four infrageneric sections Quercus, Cerris, Ilex and Cyclobalanopsis for the Chinese oaks.
None of this is new. My colleagues and I published an updated classification for oaks a few years ago (Denk et al. 2017) that took into account molecular phylogenies, and introduced the systematic concept referred to by Yang et al., and recently followed by a many-species global oak phylogenomic study (Hipp et al. 2020). All of this is based on nuclear data only, because any researcher who ever studies oak genetics soon realizes that the plastomes are largely decoupled from speciation processes, but are geographically highly constrained (eg. Simeone et al. 2016, Yan et al. 2019). This is the reason why oaks are indeed "ideal models to assess patterns of plant diversity" – they provide a worst-case scenario not the (trivial) best-case one.

As can be seen in the Yang et al. tree, members of section Ilex, a monophyletic lineage forming highly supported clades in trees based on nuclear data, are scattered all across the subgenus Cerris subtree. I have annotated a copy of this tree here.

Yang et al.'s fig. 1a, with some clades newly labeled for orientation

Because of the plastid incongruence, the subgenus Cerris subtree has a wrong root (section Cylcobalanopsis diverged before sister sections Cerris and Ilex split). Also, the reciprocally monophyletic, genetically coherent sections Cerris (green) and Cyclobalanopsis (blue) are embedded in the much more diverse Ilex 3 and Ilex 4 clades. The remaining Ilex species are placed in two early diverged clades, which I have labeled Ilex 1 and Ilex 2 in the above tree (note: the taxon set only includes Chinese oak species). The only indication the tree gives that we have a data conflict issue is the low support (gray circles represent branches with Maximum likelihood bootstrap support > 60).

The network

When interpreting the phylogenetic implications of a Neighbor-net, we have to keep in mind that it is not a phylogenetic network in the strict sense (ie. displaying an evolutionary history), but is instead a meta-phylogenetic graph: a summary of incompatible splits patterns. Incompatibility can have different origins: reticulation, recombination, diffuse or poorly sorted signals, etc. Consequently, when looking at a Neighbor-nets and their neighborhoods (Splits and neighborhoods in splits graphs), we need to keep in mind what kind of data we used to calculate the underlying distance matrix in the first place.

If the data follows two incongruent trees ("phylogenies"), as in this case for the oaks, the Neighbor-net has a good chance of capturing the incompatible splits of both genealogies. Here is the graph from the paper.

Wang et al.'s fig. 1b.

The central inflated portion of the graph reflects the incongruence between the combined data sets: we have overlapping nuclear-informed and plastid-informed neighborhoods.

The authors' brackets (shown in black) refer to neighborhoods triggered by the two nuclear markers in the data set: these are neighborhoods reflecting the common origin and speciation within the oak lineages. We can even see that this signal, which is incompatible with all deep splits in the combined tree, is unambiguous in part of the data (the nuclear partitions): section Ilex spans out as a wide fan, but there is a relatively prominent edge bundle defining the according neighborhood (the blue split).

The net shows additional, even more prominent edge bundles defining partly overlapping or distinct neighborhoods (the red splits). These neighborhoods are represented as clades in Yang et al.'s phylogenetic tree (fig.1a). They write (p. 11 of 20):
However, the conflict between the two datasets seems to be recovered by the neighbor-net method in this study, as the neighbor-net network based on combined plastid–nuclear data strongly shows the presence of two subgenera and four infrageneric species groups for the Chinese oak’s lineage (Figure 1b).
Interestingly, the authors nonetheless used the substantially incongruent combined data for downstream dating and trait mapping analysis (p. 7/20):
Bayesian evolutionary analyses provided a concordant infrageneric phylogeny for the Chinese oak’s lineage at the species level (Figure 2).
This uses a taxon-filtered, obviously constrained (fixed) topology, fitted to the current synopsis outlined in Denk et al. (2017). [Note: the supplement includes the extremely incongruent nuclear and plastid trees, each of which has further incongruence issues because they combine fast- and very slow-evolving sequence regions.]


More posts on oaks, plastid data and networks can be found here in the Genealogical World and in my Res.I.P. blog.

Cited papers

Denk T, Grimm GW, Manos PS, Deng M, Hipp AL. (2017) An updated infrageneric classification of the oaks: review of previous taxonomic schemes and synthesis of evolutionary patterns. In: Gil-Pelegrín E, Peguero-Pina JJ, and Sancho-Knapik D, eds. Oaks Physiological Ecology. Cham: Springer, pp. 13–38. Open access Pre-Print [major change: Ponticae and Virentes accepted as additional sections in final version].

Hipp AL, Manos PS, Hahn M, Avishai M, + 20 more authors. (2020) Genomic landscape of the global oak phylogeny. New Phytologist 229: 1198–1212. Open access.

Simeone MC, Grimm GW, Papini A, Vessella F, Cardoni S, Tordoni E, Piredda R, Franc A, Denk T. (2016) Plastome data reveal multiple geographic origins of Quercus Group Ilex. PeerJ 4:e1897. Open access.

Yan M, Liu R, Li Y, Hipp AL, Deng M, Xiong Y. (2019) Ancient events and climate adaptive capacity shaped distinct chloroplast genetic structure in the oak lineages. BMC Evolutionary Biology 19:202. Open access.

** The publisher, MDPI, thrives in the gray zone between predatory and accredited publishing. Originally included in the recently reactivated Beall's List (new homepage), it has been tentatively dropped (see the linked Wikipedia article; but see also this post by Mats Widgren). Personally, I have encountered articles published in MDPI journals only where the review process must have been, at least, strongly compromised. But it's always quick: Yang et al.'s paper was submitted July 24th, accepted August 12th, and published a day later. Three weeks is about the length of time that the editors of my first oak paper needed to find a peer reviewer at all.

Monday, September 7, 2020

Fossils and Networks 3 – (deleting and) adding one tip

In the last Fossils and Networks post, we explored the use of SuperNetworks to identify both safe and problematic branching patterns by removing one OTU and re-evaluating the analysis. Here, we'll take the opposite approach, and see what we can learn from adding one OTU to our analysis.

Breaking and supporting wrong branches

We start again with the artificial Felsenstein Zone matrix that results in a wrong AB clade. Here's the original true tree used to generate the matrix.

Because of convergent/parallel evolution in the modern taxa (genera O, A and B) and primitive characters of their fossil sisters, any phylogenetic inference method will find the wrong, tree with a A + B | rest split.

In the Felsenstein Zone, parsimony will always get the wrong tree due to long-branch attraction (LBA), while Maximum likelihood has a 50:50 chance to escape LBA. To break down the LBA between A and B, we need a fossil that is, from an evolutionary point of view, intermediate between D and B.

If we add a fossil E that features 1 out of 3 derived traits found in the BD lineage (including the only synapomorphy of BD), we end up with two alternative parsimony trees: one with a wrong topology and the other the correct topology, as shown here.

By adding a fossil F featuring 2 out of 3 derived traits, we increase the number of most-parsimonious trees (MPTs) to three alternatives, all of which fall prey to A-B+F LBA, as shown next.

Convergent evolution is a problem for tree inference but selection bias and homoiologies are worse, involving accumulation of the same advanced trait within some but not all members of a lineage (Has homoiology been neglected in phylogenetics?). This is worse because the characters will enforce attraction between long-branching, highly evolved (more modern) taxa. A and B are siblings, but by enforcing an ABF clade, we will inevitably misinterpret the most primitive members of the ingroup, C and D. Hence, we may draw wrong conclusions about evolution in the A–F lineage.

Because E is virtually half-way evolved between D and F, and F is the next step towards B, the all-inclusive tree gets it right. We infer a single optimal tree, shown here.

PS: Also, in this case we could use any other optimality criterion (Maximum Likelihood, Least-squares, Minimum Evolution) and we would end up with the same tree.

Missing the important bits

That last observation is encouraging: the more fossils we include in our matrix and the better they reflect the evolutionary trends within a group (here from a D-like ancestor via E to F and B), the greater the chance of ending up with the true tree. There's only one drawback: in real-world data sets, we may miss exactly those traits in the fossil sample that are needed in order to infer (or stabilize) the true tree.

(Paleo-)Parsimonists have frequently argued that missing data are unproblematic, which is true in one sense, as shown in the above example. The commonly used strict consensus tree has no wrong branches, because it only has one, which is the trivial ingroup-outgroup split. The much less commonly used Adams consensus tree has one more branch, which is wrong: the ABF clade.

As always in such cases, the strict Consensus network visualizes the MPT sample best (again exemplifying why we should stop using cladograms).

The price for not having false positives is that we cannot infer a most-parsimonious tree or a few alternative trees any more, but could easily end up with scores of them. Here, we have 41 MPTs for a 8-taxon dataset that include fairly wrong trees*, although some of them are closer to the true tree (green and olive edges in the strict Consensus network above). For large matrices, or matrices lacking tree-like signals, the number of MPTs can easily reach tens or hundreds of thousands. Lacking critical traits in E (14 out of 46 characters missing) and F (7 missing), we may escape LBA at the cost of decisiveness. If we do have those traits only in F but not E, we will enforce LBA between A and B.

Plus-1-trees (and SuperNetworks)

Before adding a taxon as an additional leaf to our tree, we may be interested in what that taxon does to our tree: can it trigger a topological change or does it fall in line? We will again take the dinosaur-to-bird-matrix of Hartman et al. (2019, PeerJ 7: e7247) as a real-world example. This includes everything from well-covered highly derived and most primitive taxa, to those that lack discriminatory signal in general (ie. are unresolved), plus the one or two rogue taxa, with ambiguous phylogenetic affinities creating topological conflict. (Note: the commonly reported strict consensus trees cannot distinguish between those two alternatives.)

The best-covered 15 taxa provide us with a single optimal tree that is in agreement with current opinion (shown below). However, this struggles to resolve the clade of modern birds because the extinct Lithornis is being attracted by Anas, the duck. When we remove Dromiceiomimus (as shown in Fossil and Networks 2), we end up with a putatively wrong Dromaeosauridae grade, because of LBA between the most distinct Dromaesauridae, Velociraptor and Bambiraptor, and the distantly related (to flying dinosaurs) Allosaurus, Tyrannosaurus and the IGM 10042 skeleton.

Two of the Minus-1 trees generated for the last post of this series.

For our experiment, we will take this (partly) wrong tree, and add every other taxon included in the Hartman et al. (2019) matrix as 15th tip. We can then perform a branch-and-bound search to infer these 14-Plus-1 tree(s). When we browse through the inferred MPTs, we can see that many taxa fall in line with the wrong topology, including a few that, in addition, increase uncertainty for branches correctly resolved in the minus-Dromiceiomimus tree.

Out of the 485 candidate trees, only 10** have a set of characters that can compensate for the missing Dromiceiomimus, leading to Plus-1 trees that show a Dromaesauridae clade, as shown here.

Two of the ten Plus-1 trees, where the added tip saves the inference from LBA. Numbers give the amount of defined characters (scored traits). Both Halszkaraptor and Zhenyuanlong are classified as Dromaeosauridae, however only the better covered taxon is placed as sister to the Dromaeosauridae included in the original 14-taxon tree.

The presence of the deep-branching Compsognathus (Tyrannoraptora: ... :Neocoelurosauria: †Compsognathidae) triggers an Archaeopteryx-Dromaesauridae clade.

In the case of relative deep-branching Garudimus (... :Neocoelurosauria: Maniraptoriformes: †Ornithomimosauria: †Deinocheiridae) and Epidexipteryx (... : Maniraptoriformes: ... : : ... : Paraves: †Scansoriopterygidae) one or two of the two or three MPTs show the wrong grade except the last the clade.

Note: the relative low number of scored traits for Epidexipteryx can avoid LBA leading to a Dromaeosauridae grade but misplace the taxon within the Plus-1 MPTs: its family, the Scansoriopterygidae, are considered to represent the sister lineage (Wikipedia, referring to Godefroit et al. 2013 Nature 498: 359–362) of the Eumaniraptora which include the Dromaeosauridae as first-diverging branch.

We can also summarize the outcome, a collection of 640 Plus-1 MPTs, in form of a z-closure SuperNetwork, as we did for the Minus-1 trees in the previous Fossils and Networks post (shown next).

This SuperNetwork is quite boxy, and may be only semi-comprehensive (I used only 20 runs, which took half a day). Matching 485 tips into a 14-taxon backbone tree is not the kind of tree sample that the SuperNetwork has originally been designed for!

Only four edges, fat and blue, are without alternatives. In all other cases, the added tip triggered the creation of several alternatives: the highest dimension for the boxes is five, but most have four or less dimensions. Regarding our problem of saving the Dromaeosauridae clade, we can see that the topological change depends on very few characters, with Microraptor being very close to the divergence but a bit more bird-like (in a very broad sense), while the other two are much more derived.

Close-up on the Dromaeosauridae part of the network, with all tips labeled. Pie charts give the percentage of scored traits/missing data. * – Tips that saved the inference from LBA (see above).

Note the length of some of the colored edges, especially the light green which represent edges reflecting a Dromaeosauridae clade. Other Dromaeosauridae taxa increase not only the diversity but also may create substantial topological ambiguity (bluish and greenish edge bundles; same color = same split) and branching bias.

Take-home message

Creating morphological supermatrixes makes a lot of sense, because it ensures normalization and facilitates universal comparability, which is crucial also for paleobiology. However, even more than molecular phylogenies, paleophylogenies are affected by character and taxon sampling. This is nothing new, and much debate has dealt with which parsimony strict consensus cladogram is the better one.

I suggest taking a new route. Instead of using morphological supermatrixes to infer trees – for this matrix, Hartman et al. found millions of equally optimal parsimony trees further filtered by post-analysis, initial tree topology informed character weighting (as implemented in TNT) – we should use it to generate subsets and engage in exploratory data analysis. This will pinpoint strengths and weaknesses of the data and its individual taxa. Rather than producing evolutionary meaningless soft polytomies, one should study the reasons for any topological ambiguity. After all, one simple reason for unstable branching patterns may be that all so-far inferred trees are biased, only differently.

The SuperNetwork can assist us in putting together taxon sets that could allow not only a simple tree inference but also topology testing.
  • If we want to test the stability of, e.g., the Dromaeosauridae clade against taxon sampling, it will be of little use to include the most primitive (anything outside Maniraptora) and much more advanced taxa (Avialae including modern birds) of the 501-taxon matrix. On one had, the most primitive taxa will only increase the computational load, because our inferred tree not only optimizes branches we are interested in, but also irrelevant ones, using taxa that largely lack discriminative signal for the branches of interest or at all. On the other hand, the most derived taxa may bias the tree inference by providing strong terminal signals outcompeting potentially conflicting weak basal signals.
  • If we want to test the stability of the backbone phylogeny against adding taxa and entire lineages, we may prefer short-branched over long-branched taxa, in order to avoid (local) LBA (especially when we want to stick to parsimony). The terminal edges in the SuperNetwork indicate the minimum number of unique changes for each tip added to the 14-taxon tree. As seen also in our hypothetical example: E and F only break down the wrong AB clade because both are either identical (or very similar) to the last common ancestor of E+F+B and F+B, respectively.
In a future post, I'll come back to the issue of identifying taxa that are game changers, using a simple and quick tree-based approach: the so-called "evolutionary placement algorithm", first implemented in RAxML.

For any of you who really don't like networks, but still find no comfort in comb-like strict consensus cladograms either: just tick the SuperTree option when inferring the SuperNetwork. But only if your input trees converge to a shared topology. Otherwise the result may look like this:

A SuperTree based on the 640 Plus-1 MPTs.

* Somebody familiar with Consensus networks and morphological data partitions providing complex signal, can extract a phylogenetic hypothesis from this boxy network for the included taxa. In general, the distance along the network edges represents a phylogenetic distance, and thus gives a direct measure of how derived a taxon is.

For example, C, D are closer to the ougroup and placed close to the centre of the graph, which is exactly where a primitive ingroup taxon, with an ancestral morphology, would be placed. F is most likely a sister of B. The olive EF | rest split supports a potential common origin of E, F, and B (long green edge bundle). Hence, A can only represent a distant, strongly evolved sister lineage (both the alternative AB and ABF clade have less character support). Also, since the graph depicts E as least derived of the four (irrespective of the topological alternatives), its affinity to F and B has more value than the affinity between A and B, both being long-branched, and hence susceptible to LBA. D fits into the picture, the olive DE edge either: (1) represents a common origin, which would make D an early member of the red lineage; or (2) has similarity due to shared primitive traits within the ingroup, which would make D an early member of an ABEF lineage. C, in contrast to D, has no clear affinities with any other ingroup member, and so can only be interpreted as an early, very primitive form with uncertain phylogenetic relationships. The (true tree) mutual monophyly of the red and blue ingroup lineages has very little character support in the matrix, and hence cannot possibly be resolved.

** Systematically they cover a range of maniraptoran ('hand hunters') families 'below' the Avialae ('flying' dinosaurs) including, in addition to two Dromaeosauridae (Halszkaraptor, Zhenyuanlong, trees shown above), members of †Alvarezsauroidea (Haplocheirus), †Caudipteridae (Caudipteryx), †Sinovenatorinae (Sinovenator), †Therizinosauroidea or related (Beipiaosaurus, Jianchangosaurus) and †Troodontidae (Gobivenator, Sinornithoides). Caihong is a member of the †Anchiornithidae, which Wikipedia flags as "Avialae ?". These OTUs show data coverage far above the median (74% missing), with 278 (Caihong) to 558 (Caudipteryx) defined characters (out of a total of 700).

Monday, August 31, 2020

Coronavirus patterns of spread

Following on from my previous posts about the SARS-CoV-2 virus, and Covid-19, the human disease that it causes, there are a number of miscellaneous topics that could also be discussed. So, here are a few topics about the spread of the pandemic, which may be of interest.

Networks of cases

I have so far not presented a phylogenetic network related to the current pandemic. I may one day do so, although collating the data I would like to use will not be easy. In the meantime, the folks over at Fluxus Engineering did publish a network of genomes back in April: Phylogenetic network analysis of SARS-CoV-2 genomes.

Network of SARS-COV-2 genomes

The authors identified:
... three central variants distinguished by amino acid changes, which we have named A, B, and C, with A being the ancestral type according to the bat outgroup coronavirus. The A and C types are found in significant proportions outside East Asia, that is, in Europeans and Americans. In contrast, the B type is the most common type in East Asia, and its ancestral genome appears not to have spread outside East Asia without first mutating into derived B types, pointing to founder effects or immunological or environmental resistance against this type outside Asia.
Needless to say, their paper generated some controversy, with three published responses criticizing the methodology (these are shown at the link above). However, the Global Initiative on Sharing All Influenza Data (GISAID) uses an expanded version of their cladistic classification.

Networks can also be used much more locally, to illustrate spread, although in an epidemic this will almost always be tree-like rather than reticulating. Here is a recent example from China: Large SARS-CoV-2 outbreak caused by asymptomatic traveler. The authors comment about the wide spread from a one individual:
An asymptomatic person infected with severe acute respiratory syndrome coronavirus 2 returned to Heilongjiang Province, China, after international travel. The traveler’s neighbor became infected and generated a cluster of >71 cases, including cases in 2 hospitals. Genome sequences of the virus were distinct from viral genomes previously circulating in China.

Different patterns of infection among communities

Pandemics are actually a series of local epidemics, and are therefore rarely simple things, in terms of when people become infected. For example, there are often a series of alternating "waves" of new cases, in response to the behavior of either the pathogen or the people themselves.

In the case of the Covid-19 disease, the virus has so far apparently produced a series of at least seven variant strains (Geographic and genomic distribution of SARS-CoV-2 mutations), but the waves are mainly the result of people's implementation of infection control measures. Depending on the pathogen, these measures can include: social distancing, fewer / smaller crowds (especially indoors), working from home, closing social venues such as restaurants and bars, as well as mass testing and infection tracking. Reducing the spread of breath aerosols also works well for SARS-CoV-2, including careful cleaning of surfaces, and wearing gloves and masks or visors.

So, early on in most epidemics, people get infected because they are not ready to deal with things; and the number of cases increases, as shown in the above graph of Covid-19 cases in the USA this year — this is the First Wave. The number of cases then usually decreases for a while, in response to the effectiveness of the control measures. However, if the measures do not remain effective, or the people get sick of implementing them, then the number of cases increases again, creating the Second Wave. The graph above makes it clear that for the USA the Second Wave has been much more serious than the First, in terms of the number of cases.

However, this picture is often much too simple, because the USA is a pretty big place. In this example, there are 50 main jurisdictions in the country, and there is no reason to expect any epidemic to proceed in the same way in every state and territory. Here are equivalent graphs for four different US states, each showing a different pattern of waves.

So, New York (and several other north-eastern states) got the SARS-CoV-2 virus early on, and most of the at-risk people got infected at that time, so that there has not yet been a Second Wave. Rhode Island, on the other hand, has actually had a small Second Wave. From here on in the north-east, infections are likely to be mostly local outbreaks (eg. New York city mayor says rise in Covid-19 cases in Brooklyn not a cluster), such as is now also being observed in Europe.

By contrast, Louisiana, the state with the highest percent of cases (per population) so far, had a relatively small First Wave, and it is the Second Wave that has been much more problematic for epidemic control. Even more extreme, Florida (and other states like California) had the virus spread much later, so that there was not really a First Wave at the same time as the other states, and it is the Second Wave that is producing the high percentage of infected people.

So, the country's pattern of pandemic spread is made up of a series of different sub-patterns of epidemics, with different jurisdictions having very different degrees of success in controlling virus spread. This matters very much for any national response to the pandemic, because it is not the same epidemic everywhere.

In a similar manner, deaths have been concentrated in those places that got the SARS-CoV-2 virus early on. We expect for most pandemics that the number of deaths will rise as the number of infection cases rises. This next graph shows the case rates (proportion of people infected) and death rates (proportion of people who have died) in each US state (each point represents one state, plus DC).

Covid-19 death rates in the states of the USA

The proportion of cases varies from a low in Vermont to a high in Louisiana, and the proportion of deaths rises along with this — 44% of the variation in deaths between states is correlated with the difference in case rate. However, there are four states in the north-east of the country (as labeled on the graph) where the death rate has been much higher than expected (about double). These states all got their virus infections early in the pandemic, so that one or more of these has been happening:
  • the deaths predominantly occurred before effective treatment strategies were developed;
  • the at-risk groups are now being protected more effectively; or
  • the currently predominant strains of the virus are less deadly than those circulating originally.
As I noted in my previous post: It is about time we started behaving rationally in response to Covid-19?. A rational response needs to take into account geographical variation in the current state of the pandemic. A one-size-fits-all response cannot be particularly effective in the face of large variation.

Comparing lock-downs to voluntary isolation

Many governments have responded to the spread of SARS-CoV-2 by instituting economic lock-downs as a form of quarantine, to keep their populace apart from each other. This is expected to be effective biologically, because the virus is spread by aerosol droplets, and keeping people apart reduces the risk of infection (eg. 1 m when breathing, 2 m when sneezing, 4 m when coughing).

However, lock-downs have not been universal. In particular, Sweden has become well-known for leaving social distancing as a voluntary exercise, although along with strict recommendations — see my post: Media misunderstandings about the coronavirus in Sweden for an explanation of the actual situation. The essential difference is between a government mandated and enforced response and a response based on social co-operation.

The economic consequences of lock-downs have been very serious, and we have constant media reports about how dire the situation has been for various industries. So, it is interesting to compare the spread of the virus in Sweden with the spread elsewhere, as a simple means of estimating how effective the lock-downs have been.

One possible comparison is with the United Kingdom. The pandemic started in both countries at the same time (first reports on 26-27 February), and the current total death rates (attributed to Covid-19) are similar (Sweden: 576 people per million, UK: 611 people per million). The case rates are quite different, however (Sweden: 8,305 people per million, UK: 4,897 people per million), and this might be attributed to the two different strategies. [Note: the USA also has a similar death rate (564 per million) but a much high case rate (18,495 per million).]

Coronavirus case-rates for Sweden and the UK
Coronavirus death-rates for Sweden and the UK

For a meaningful comparison, we need to look at the rates, not the raw data, because the two populations are very different in size (Sweden; 10 million, UK: 68 million). These two graphs show the case rate and death rate through time for the two countries. The comparison is quite revealing. [Note: the saw-tooth patterns in the graphs come from the fact that medical reports in most countries are notably fewer on weekends.]

As expected, the cases initially increased faster in Sweden. However, the case rates were very similar in the two countries by the last week of March; and they remained so until Sweden started serious virus-testing in late May. Just at the moment, the case-rates are similar again, although the UK has actually done twice as much virus testing as Sweden (240,000 tests per million people versus 110,000). Anyway, the two different government responses did not produce much difference in the number of cases for the first 3 months of the pandemic.

The death rates show quite a different pattern. The rates started off very similar, but by the end of March the UK actually had a higher death rate than Sweden. This situation was maintained until the end of May, after which Sweden had the higher rate until the end of July. Once again, the two countries are now very similar. Overall, the time-course of deaths is highly correlated between the two countries (79% shared variation), while the case rates are not (7%).

Of particular note here is that the differences in case rates have not resulted in differences in death rates. Apparently, Sweden's voluntary response has allowed a greater proportion of the population to become infected but this has not resulted in more deaths. I am fairly sure that the authorities will attribute this to the development of herd immunity (which I will talk about in my next post on the coronavirus) (WHO expert praises Swedish strategy - urges other countries to follow suit). [Note: a direct comparison with the USA would be pointless, given the geographical variation discussed above.]

The consequences are far-reaching. As but one example of the unfortunate consequences of the UK lock-down, you could read up on the fiasco concerning the final-year school exams (A coronavirus lesson about the modern state) — without a lock-down, Sweden avoided such problems for its young people.


There is a wealth of data in this pandemic, enough to keep data analysts busy for a very long time. I am sure that we will be inundated with reports for many years to come. In the meantime, like all pandemics, the geography of the local epidemics is a vital point in implementing effective control strategies.

Monday, August 24, 2020

Constructing rhyme networks (From rhymes to networks 5)

As is now happening for the summer, this little series on rhyme networks is also coming to its end. We have only two more blog posts to go, with this one discussing the construction of rhyme networks, and then one more post in September, discussing how rhyme networks can be analyzed.

A preliminary annotated collection of rhymed poetry in German

While my original plan was to have all of Goethe's Faust annotated by the end of this series, so that I could illustrate how to make rhyme analyses with a large dataset of rhyme patterns in a language other than Chinese, I now have to admit that this plan was way too ambitious.

Nevertheless, I have managed to assemble a larger collection of German rhymes from various pieces of literature, ranging from boring love poems to recent examples of German Hip-Hop; and all of the rhymes have been manually annotated by myself during recent months.

This little corpus currently consists of 336 German "œuvres" (the data collection itself has more poems and songs from different languages), which make up a total of 1,544 stanzas (deliberately excluding the refrains in songs). There are 3,950 words that rhyme in this collection; and together they occur 5,438 times in a total of 49,797 words written by 72 different authors. The following table summarizes major features of the German part of the database.

Aspect Score
components 994
authors 72
poems 336
stanzas 1544
lines 8340
rhyme words 3950
words rhyming   5438
words total 49797

The whole collection, which is currently available under the working title "AntRhyme: Annotated Rhyme Database", can be inspected online at, but due to copyright restrictions for texts from recent pop songs, not all of the poems can be displayed. In order to share the annotated rhymes along with the initial Python code that I wrote for this post, I have therefore created a version in which only the annotated rhyme words are provided, along with dummy words in which each character was replaced by a miscellaneous symbol. As a result, the song "Griechicher Wein" ("Greek wine") by Udo Jürgens from 1974 now looks as shown in the following figure.

Modeling rhymes with networks

As far as Chinese rhyme networks were concerned, I have always given the impression (and also truly thought this myself) that the reconstruction of a rhyme network is something rather trivial. Given a stanza in a given poem, all one has to do is to model the rhyme words in the stanza as nodes in the network, and then add connections for all of the words that rhyme with each other according to the annotation.

While I still think that this simple rhyme network model is a very good starting point, there are certain non-trivial aspects that one needs to carefully consider when working with this kind of rhyme network. First, there is the question of weighting. In the first study that I devoted to Old Chinese poetry (List 2016), I weighted the nodes by counting their appearance, and I also weighted the edges by first counting how often they occurred. I then normalized this score in order to receive a more balanced weighting. The normalization would first count each rhyme pair only once, even if the same word occurred more than one time in the same stanza, and then apply a formula for normalization based on the number of words rhyming with each other within the same stanza (see ibid. 228 for details).

However, in the meantime, a young scholar Aison Bu has suggested an even better way of counting rhymes, in an email conversation with me. [The pandemic prevented us meeting in person at a conference in early April, so we could never follow this up.] Since rhyming is essentially linear, my original counting of all rhymes that are assigned to the same rhyme partition in a given stanza may essentially be misleading. Instead, Aison suggested counting only adjacent rhymes.

To provide a concrete example, consider the third stanza in the song "Griechischer Wein" by Udo Jürgens (shown above). Here, we have the rhyme group labeled as f, which occurs three times in the data, with the rhyme words Wind (wind), sind (they are), and Kind (child). The normalization procedure that I proposed in the study from 2016 would now construct a network in which all three words rhyme with each other. To normalize the edge weights, each individual edge weight would be modified by the factor 1 / (G-1), where G is the number of rhymes in the rhyme group in the stanza (3 in this case, as we have three words rhyming with each other). Aison's rhyme network construction, however, would only add two edges, one for Wind and sind, and one for sind and Kind, as they immediately follow each other in the verse. A specific normalization of the edge weights would not be needed in this case.

A first rhyme network

Unfortunately, I have not had time so far to test Aison's idea, to draw only edges for adjacent rhymes when constructing rhyme networks. However, with the data for more than 300 German poems and songs assembled, I have had enough time to construct a first and very simple network of German rhyme data.

For this network, I disregarded all normalization issues, and just added an edge for each pair of words that would have been assigned to the same rhyme group in my rhyme annotation. This network resulted in a rather sparse collection of 994 connected components. This is in strong contrast to the Chinese poems I have analyzed in the past (List 2016, List 2020), which were all very close to small-world networks, with one huge connected component, and very few additional components. However, it would be too early to conclude that German rhyme networks are fundamentally different from Chinese ones, given that the data may just be too sparse for this kind of experiment.

At this stage of the analysis, it is therefore important to carefully inspect the networks, in order to explore to what degree the network modeling or the data annotation could be further improved. When looking at the largest connected component, shown in the following figure, for example, it is clear that typical rhyme groups that we would expect to find separated in rhyme dictionaries do cluster together. We find -aut on the left, -aus and -auf on the right, with the word auch (also) as a very central rhyme word, as well as Frau (woman).

While these words can be defended as rhymes, given that they share the diphthong au, we also find some strange matches. Among these is as the cluster with -ut on the bottom left, which links via Mut (courage) to Bauch (belly) and resolut (straightforward). Another example is the link between Frau and trauern (mourn). The former link is due to an annotation error in the poem "Freundesbrief an einen Melancholischen" ("Friendly letter to a melancholic") by Otto Julius Bierbaum (1921), where I wrongly annotated Bauch and auch to rhyme with resolut and Mut.

However, the second example is due to a modeling problem with rhymes that encompass more than one word. This pattern is very frequent in Hip-Hop texts, and I have not yet found a good way of handling it. In the case of Frau rhyming with trauern, the original text rhymes trauern with Frau an, the latter being a part of the sentence "schaut euch diese Frau an" ("look at this woman"). Since my conversion of the text to rhyme networks only considers the first part of multi-word rhymes as the word under question, it obviously mistakenly displays the rhyme, which is also show in its original form in the figure below.


The initial construction of German rhyme networks which I have presented in this post has shown some potential problems in the conversion of rhyme judgments to rhyme networks. First, we have to count with certain errors in the annotation (which seem to be inevitable when doing things manually). Second, certain aspects of the annotation, especially rhymes stretching over more than one word, need to be handled more properly. Third, assuming that poetry is spoken, and spoken texts are realized in linear form, it may be useful to reconsider the current rhyme network construction, by which edges for rhyme examples are added for all possible combinations of rhyme words occuring in the same rhyme group. For the final post in this series next month, I hope that I will find time to address all of these problems in a satisfying way.


List, Johann-Mattis (2016) Using network models to analyze Old Chinese rhyme data. Bulletin of Chinese Linguistics 9.2: 218-241.

List, Johann-Mattis (2020) Improving data handling and analysis in the study of rhyme patterns. Cahiers de Linguistique Asie Orientale 49.1: 43-57.

For those of you interested in data and code that I used in this study, you can find them in this GitHub Gist.

Monday, August 17, 2020

Isn't it about time we started behaving rationally in response to Covid-19?

I have written a few blog posts recently about the current Covid-19 pandemic, caused by the arrival of the SARS-CoV-2 virus in our lives. This interests me as a biologist with some background in the study of pathogens (disease-causing organisms).
There have been two extreme responses to the current pandemic. There are all sorts of variants in between, of course, but I will start by characterizing the extremes, and then move on to some practical examples. The point here is that we need a reasoned response to this pandemic, based on the effect of the virus on people, and the make-up of the populations being affected. The current one-size-fits-all approach used by most governments is not going to work, long-term.

The future of having to live with the virus is becoming clearer. Actions can be individual, but they need to be co-ordinated, with each of the risk groups being treated appropriately. Even if you personally feel secure, those around you might experience risks very differently. An all-purpose set of mandated behaviors might work short-term, but we cannot continue to live that way. Behavior needs to make all risk groups feel safe at all times, by being targeted appropriately.


At one extreme, people are trying to hide from the virus. By this, I mean that they are trying to keep away from it. Obviously, many people are doing this individually, but whole countries have also been trying to do it, notably Australia and New Zealand, which are geographically isolated by virtue of being islands. At the other extreme, people are trying to "crush" the virus, like they are playing poker against some weak opponent.

The problem with the first extreme is that you can never come out of hiding, because the virus does not go away, it just sits there (like viruses do) until you finally come past, and then it will get you, after all. This is what the so-called Second Wave of infections is currently showing us. The First Wave of infections occurs because people do not know about the pathogen, and therefore catch it inadvertently. In response to the rapid increase in case rates, people go into self-quarantine, trying to prevent themselves from encountering the virus. This works, but they eventually get tired of doing it, and they come back out again — and that is the Second Wave of infections. It is nothing new as far as the virus is concerned, it simply reflects changing human behavior (out, in, out again).

A prime example of the other extreme is expressed by this recent New York Times article: Here's how to crush the virus until vaccines arrive, or even the Wall Street Journal: The treatment that could crush Covid. You can't crush a pandemic, as we know from the seemingly endless series of previous pandemics in recorded history, and presumably many more of them before we learned to write. Naturally, Wikipedia has a List of epidemics, for you to peruse.

However, at some stage, people are going to have to start treating the current pandemic like the influenza virus — a natural part of their environment, where they take standard precautions to minimize their risk. In response to the perennial threat of flu, old people take vaccines in winter, middle-aged people stay away from public transport during flu season, and young people simply get on with their lives (because a bit of flu will not kill them). These are rational responses, taken by people after evaluating the perceived risk of infection to themselves.

To do this for Covid-19 we need to consider what we have learned so far this year.

We need to learn

During the First Wave of any pandemic we need to over-react, while we find out how the new pathogen behaves and what effects it can have. So, we try everything from social distancing to lock-downs, to see what seems to work in practice. The objective is to reduce the rate of spread of the virus — in biological terms, we are trying to work out what things will flatten the curve (see: Coronavirus: What is 'flattening the curve,' and will it work?).

For example, one current debate is: do face-masks provide protection, in the community setting? They work in hospitals, for sure (Face masks really do matter: the scientific evidence is growing), but that is a specialist environment, where they are used by professionals in conjunction with other methods (hand scrubbing, special clothing, etc). We need to find out whether people can routinely wear face-masks properly, so that the masks do what they are designed to do. We may actually be better off with perspex visors, for example, which are also effective at preventing the spread of breath aerosols (which is the main problem), and they can be worn effectively even by a novice — and they do not make us all look like we are involved in a bank hold-up.

We also need different groups of people to try different approaches, to see how effective they are. If everyone does exactly the same thing, strictly following World Health Organization recommendations for example, then we do not learn much, as a global community. That is, a pandemic is simply a widespread (global) series of epidemics, one in each local area. Since countries are all different, culturally, this cultural diversity creates the ideal environment to maximize learning-by-doing, by treating the pandemic as a set of epidemics, to which we might respond differently.

For example, the Buddhist-dominated communities of South-East Asia have done things in a very community-cooperative manner (these people do not work alone, by choice); and they collectively have the lowest infection rates on the planet. The Muslim-dominated countries of the Middle East do not worry much about life threats (whether they die or not is the Will of Allah), and they collectively have the worst rates. The individual creed of Americans does not encourage them to act co-operatively (resulting in draconian government-mandated lock-downs), and so they also have a very high rate. Sweden is one of the few remaining socialist cultures, where governments give advice rather than issuing instructions (resulting in this case in co-operative self-quarantines), and they have a middling-to-high infection rate.

We learn many things about alternative effective actions from this cultural diversity. In particular, media criticism of the different national reactions to the pandemic is now dying down, as the critics slowly come to realize that uniformity always results in an all-or-none outcome.

What have we learned?

Okay, so after the First Wave we know that this new virus can do everything from: apparently nothing (there are plenty of people with antibodies who have never felt any symptoms of having had the virus), to creating flu-like symptoms (key symptoms: fever, cough, skin rash, loss of taste & smell), on to hospitalization (with usually c. 7 days to get rid of the symptoms but 5 weeks to get rid of the actual virus), or even intensive care (as a result of what is medically called a cytokine storm). For the elderly, and others with pre-existing medical conditions, the virus seems to be one thing too many for their body, the proverbial straw that breaks the camel's back — which can lead to death sooner rather than later.

So, not only does SARS-CoV-2 infection not mean death for the vast majority of people (globally, < 3.6% of reported infections have resulted in death), it does not even necessarily mean sickness at all (eg. a Swedish study showed that 46% of those study people with antibodies had never reported clinical symptoms). This should mean something for our future responses.

Notably, in those countries where a significant Second Wave is now occurring, the new infections are often not resulting in deaths (except notably in Australia). This is a very important difference between the First and Second Waves, in most places. There is speculation that the SARS-CoV-2 variants currently widespread are less deadly than were those common at the beginning of the pandemic; but it is equally likely that those people who were most susceptible to the virus have already succumbed during the First Wave.

So, we now know about the risk groups, roughly, which is as good as we ever know such things; and we have a good idea about the outcomes of the various risks. This means we can start to do some reasoned things, as a pandemic response. The Second Wave is a perfect time to start treating the Covid-19 situation rationally.

The time for some new action?

This means that it is time to start targeting actions to the degree of risk for each person, rather than having over-arching actions that affect everyone equally. Our individual responses to the virus are not equal, so why are most government actions still predicated on the idea that we are all equal?

The point is, we have to respond to what we have learned about relative risks. For example, I have argued before that the biggest mistake Sweden has made was letting Covid-19 get into the aged-care facilities, which is where most of the country's deaths have now occurred. Has anyone learned from this mistake? Apparently not in the USA: Untested for Covid-19, nursing-home inspectors move through facilities. Come on people — get your act together.

The response to the First Wave always needs to assume equality, because anything else would be irresponsible, in the face of our initial ignorance. During the Second Wave, however, we are no longer quite so ignorant, and we can tailor our actions to suit the conditions. When are we going to start doing this?

In order to think about this question, it is worthwhile to consider a few topics that seem to be on the agenda, and look at some practical examples of three relevant situations.

Trying to hide

Any country that successfully hides from the virus has to keep hiding, forever. New Zealand has recently been crowing about having gone 100 days without a new coronavirus case. That record was destroyed this week (New Zealand on alert after 4 cases of COVID-19 emerge from unknown source); and it will get even worse on the day they allow the first visitor into their country. Their current Alert Level 3 response cannot change this — you cannot hide from a virus.

New Zealand's near neighbor, Australia, has demonstrated this point even more strongly. In one sense, the Australians understand quarantine, because it is a big part of keeping plant and animal diseases out of their country. For example, international visitors are regularly surprised to have biological products (notably wood) confiscated at the arrival airport — better safe than sorry.

So, dealing with Covid-19 should be straightforward for them — you just apply the same idea to the people, themselves. Sadly, it took them some time to realize that you have to take people straight from the airport to a quarantine hotel, if the quarantine strategy is to work. One of my nephews returned to Sydney (Australia) from Copenhagen (Denmark) at the beginning of the First Wave, and he had to make his own long way by public transport from the airport to the quarantine house that his father had arranged!

So, it should not be a surprise that quarantine has not been effective everywhere in Australia — one mistake is all it takes. This mistake was made in the quarantine hotels in Melbourne (Victoria), where the quarantine security turned out to be a joke (see: New coronavirus lockdown Melbourne amid sex, lies, quarantine hotel scandal). Perhaps the security guards should have read the earlier article on: Sex in the time of coronavirus.

The issue here is that Australians are no better than Americans at following government instructions — individual rights take precedence (see: Individual choice is a bad fit for Covid safety). Even my local newspaper here in Uppsala (Sweden) reported (Regel brott ger böter) the news that military personnel were sent to visit 3,000 Australians who were supposed to be in self-quarantine at home (due to having tested positive for the virus), and 800 of them (one-quarter!) were not at  home. I lived in Australia for 40 years, and this situation surprises me not at all.

So, hiding does not work, long-term, because you have to keep it up for too long to be practical for most people. The Second Wave in Victoria is actually worse than the First Wave, in terms of number of Covid-19 cases. The ensuing lock-down is now even worse than it has been in most other places (see: 'Very dead': army and police patrol the deserted streets of coronavirus-stricken Melbourne); and Victoria itself has been quarantined from the rest of the country.


We have all been told that the effect of Covid-19 is age-related; and the global data shows that this is true everywhere — the older you are, the more likely you are to seriously affected. One outcome of this knowledge is that actions can be tailored to age groups. Notably, we can consider the idea that massively disrupting the lives of very young people may be doing more them harm than good, due to stress if nothing else (Lockdowns and school shutdowns may make youngsters sicker).

Most countries mandated the closure of schools, and instituted some form of working from home for the pupils. This move was predicated on the idea that children will catch the virus in the crowded schools, and bring the disease home to their elders. This scenario seemed to be the case, for example, in the early spread of the SARS-CoV-2 in northern Italy.

Recent evidence, however, suggests that, while the youngsters do catch the virus, they are much less infectious than older people (see: COVID-19 study confirms low transmission in educational settings). We are talking about pre-teenagers here, not older children. This does not mean that they can't spread the virus (see: Latest research points to children carrying, transmitting coronavirus), but merely that this is a much lower risk.

It has therefore been suggested that a rational response would involve a trade-off between disrupting the lives of very young people versus the risk of viral spread (see: Why it’s (mostly) safe to reopen the schools). Notably, this issue was explicitly considered in Sweden, and during the First Wave it was decided to keep the junior schools open, but to close the senior schools (ie. high school). So, the younger children have all been trundling off to school every week-day, just as usual, the whole time. As far as I know, there has not been even one reported outbreak involving any of the open schools.

This is why I emphasize the importance of culturally diverse responses to a pandemic. In this case, the Swedes seem to have got it right; and everyone else could learn from this.

Young people

It is a different matter for somewhat older (but still young) people. The so-called Millennial generation has had a pretty tough time, especially financially. This is the second financial down-turn that they have experienced in a dozen years, just when they are trying to get themselves onto their own two feet (see: Millennials slammed by second financial crisis fall even further behind).

So, none of us should be surprised that these people are thoroughly sick of restrictive pandemic responses by now. Indeed, it is becoming widespread news that case rates are increasing among 20-29 year olds (or 15-25, depending on how people are grouped) (see: WHO urges young people to help control the spread of coronavirus). This has become particularly obvious in Europe (see: Coronavirus cases rise in Europe as youth hit beaches and bars), but also in North America (see: B.C. hospitalizations, deaths steady as latest wave hits mostly young people) and Australia (see: Coronavirus Australia: Why young people are spreading COVID-19).

This is not necessarily as bad as it might sound, because the effect of the virus is age-related, and these people will probably mostly be safe (but not all). The same thing is true for somewhat younger people — youth is a social time, and mandated restrictions about distancing may not be very effective (see: Why the teenage brain pushes young people to ignore virus restrictions).

Places like Japan and Spain are now cracking down on bars, and the like (eg. Spain cracks down on outdoor drinking, smoking in renewed push against COVID-19). If you want some survey data on what activities U.S. people currently feel comfortable doing, then check out: Weekly updates on consumers’ comfort level with various pastimes.

In this situation, Sweden has not been exempted; and recent coronavirus cases have become prevalent in the 20-29 year old group, just like elsewhere else. Once again, this emphasizes that our knowledge cannot all come from one place. No-one gets it all right, but they may get some things right; and we should learn from both success and failure. This is the rational approach, not the one-size-fits-all approach.

Adding to this scenario, as I write this blog post, Europe is having a warm spell (up to 40 °C in the south), and my local newspaper has the headline: Chaos on Europe's beaches in the heatwave. All governments are warning about the need to continue keeping people apart, for those who wish to avoid infection. Fortunately, the summer holidays are nearing their end in the northern hemisphere.

Concluding comments

From the biological perspective, for the future to be bearable, we need to reach herd immunity, which refers to public safety in the presence of a pathogen. This is determined by the proportion of the (local) population that needs to become immunized (either by becoming infected or by being vaccinated) in order for the infection to stop spreading (see: A new understanding of herd immunity).

We can achieve herd immunity by responding rationally based on the make-up of the population, in terms of the relative risks. At-risk groups need to be protected, while the rest of the people get on with their lives. For example, Stockholm, in Sweden may now be getting close to herd immunity (or flock immunity, as the locals would call it), the Swedes having foregone the lock-downs imposed elsewhere, and thus allowing immunity to arise naturally.

Herd immunity can be achieved without rationality, of course — we simply wait for the weakest people to die, and the rest are likely to be safe. You might not like the moral implications of doing this, but it is biologically effective, nonetheless. For example, India may potentially end up with the world's worst case-rate for infections, given its population size and large degree of poverty in many areas (where social distancing is not feasible). However, its saving grace, in terms of deaths, may well be the consequent fact that poor people are usually young, because poor people do not live long in the first place. Herd immunity to SARS-CoV-2 is easy to achieve under these circumstances (see: Herd immunity seems to be developing in Mumbai’s poorest areas).

I vote for the rational approach, myself, among the many biological alternatives.