by Michelle Ziegler
Up until a few months ago there were a few representative samples of the Yersinia pestis genome. Important windows into its secrets, but windows none the less. In January a Chinese group remedied this situation by expanding the number of fully sequenced genomes from 15 to 133 (Cui et al, 2013). China supplied 107 genomes selected from over 900 genotyped specimens collected since 1955 to represent bacterial and host diversity. To these, 11 additional isolates from Mongolia, Myanmar (Burma), the former Soviet Union, and Madagascar were fully sequenced. For the analysis, the previously sequenced 15 genomes were added bringing the total up to 133 including the ancient specimens from 14th century London.
The Core-Genome and the Pan-Genome
Even for a bacterium like Yersinia pestis that is considered to have little genetic diversity, its genome is more elastic than any eukaryote (everything but bacteria). The bacterial genome can be divided into its core genome, found in all members of the species, and the accessory genome, sequences found only in some strains. Plasmids are part of the accessory genome but not all of it. Extra genes are also found on the bacterial chromosome. The core genome is 3.53 Mb long with 3450 genes; the accessory genome has 1.92 Mb with 1249 genes (including 451 on the six known plasmids) (Cui et al, 2013, Table S1). So the accessory genome contains 26% of genes found in the species. This may seem like a lot, but more promiscuous species like Escherishia coli (E. coli) have many more accessory genes than core genes. With E. coli the more specimens that are sequenced, the larger the accessory genome gets with no end in sight.
Combining all of the genes found in Yersinia pestis (core and accessory genome), we have the pan-genome. The pan-genome is 5.46 Mb with 4699 genes (Cui et al, 2013). No one strain has all of these genes. So different strains do have significant differences in their functions but, as far as I know, there are no significant differences in human prognosis. Hopefully, there will be more study in the future that cross-references strain type or particular genes with human prognosis, transmission routes (% bubonic vs pneumonic), hosts etc.
Using known and new SNPs, the phylogenetic tree has finally been fleshed out into a healthy looking tree . We couldn’t keep the sickly looking Charlie Brown tree of the past forever! Even so, the tree below represents only the main branches.
To my mind, the most important aspect of the new tree is that nodes of increased diversity are much more apparent. The authors are the most excited by node 7 where there is a four-way branch, adding two new branches (3.ANT1 and 4.ANT1) to the main stem of the tree. They refer to this diversity point as the ‘big bang’. This node gains the most attention because the 14th century London genomes are just one step off of node 7 down the 1.ANT1 branch. So it stands the reason that node 7 represents a period of diversity that produced the second pandemic. Yet, looking at their diagram, other locations like node 12 have greater diversity. The 1.IN strains are intermediary on the same lineage between the second the third pandemic. Node 14 is the initial diversity that produced the third pandemic. Calling node 7 a ‘big bang’ seems to me to have more to do with it producing the second pandemic rather than the diversity at the node itself. The new third and fourth branch (3.ANT and 4.ANT) are concentrated in Mongolia, putting emphasis on the importance of doing such deep sequencing in other Central Asian regions. It is impossible to tell which host species these bursts of diversity occurred within, almost certainly not humans. It’s not that diversity can’t be generated in humans especially during a pneumonic plague, but since it is not endemic in humans, it must make it back to a reservoir to be preserved anywhere other than in ancient DNA.
Biogeography shows clustering of related strains in regions as would be expected, though they are fairly well mixed within the circled zone in the map above. Samples seem to follow ancient roads, although keep in mind all of these strains have been isolated within the last 60 years. I do wonder why they were not able to identify a route for the eastern branch two isolates. All of the branch two isolates appear to be running along a fairly straight line from southwest to northeast China (extending trade route III to Manchuria). The 107 Chinese specimens were chosen from > 900 strains identified from 5000 isolates for their diversity revealed by genotyping, host diversity and geography (Cui et al, 2013). It would have been interesting to see a map with all 5000 on it as a measure of abundance (with or without typing).
The oldest strain 0.PE7 is found only on the Qinghai-Tibet plateau in China, an area framed by the ancient trade routes along which most of the western strains are found. This has led Cui et al, 2013 to postulate that the Qinghai-Tibet plateau as the origin of Yersinia pestis.
Unsteady Molecular Clocks
Estimating ages from genetics can be a very risky business. To estimate years since the last common ancestor, it requires a steady molecular clock , measured in base changes per unit of time. In theory all of the genes from the core genome should have changed to the same degree from the common ancestor, but that is not the case at all. The number of SNPs in the Yersinia pestis core genome varies greatly. Even excluding the most divergent Angola (0.PE3) strain, there is “a nearly 40 fold difference between the slowest and the fastest evolving branches” (Cui et al, 2013). An unsteady molecular clock was also suggested by previous data from Madagascar, though the discussion was buried in the supplementary material (Morelli et al, 2010, p. S10-s18). Mutator phenotypes do occur (Rajanna et al, 2013), though Cui et al, 2013 assure us that none of these strains are mutators. On the other hand, a Georgian group suggest that the mutator phenotype, a single point mutation, could naturally reverse (back mutate) altering the predictability of the lineage age (Rajanna et al, 2013). The Chinese group concluded that the faster clock rates for some branches are due to a higher reproduction rate, probably due to more or larger epidemics in the lineage (Cui et al, 2013). The types of genetic changes (SNPs) indicate neutral selection, so the increased reproduction rate is not due to the genetic changes.
While I understand that calculating divergence dates an important exercise to people who focus on phylogenetics, for the understanding of historical plague it is not useful. It is not solid or specific enough to base historical events upon alone. Predictions are just that; all of these groups have been proven wrong, sometimes later by themselves, too often. Most importantly, it appears that it will eventually be trumped by ancient DNA analysis with an archaeological and/or documentary context. As far as I’m concerned, the Angola strain is a genetic and geographic outlier of uncertain provenance. We don’t know important factors like how long it was kept in active culture before it was made into a stock or the conditions of storage. Both of these can effect mutation rates and the molecular clock (Rajanna et al, 2013). I’m sure the Angola strain’s story is interesting but unlikely to be useful for understanding the whole species unless it turns up in ancient DNA.
Gaining and Loosing Diversity
Returning to these starburst points on the tree, called polytomys, where multiple lineages share the same ancestor, we have some of the most valuable information in the new phylogenetic tree. Epidemics (and presumably epizootics) are believed to have an increased reproduction rate over enzootic plague. Since the mutation rate is directly tied to the reproduction rate, increased reproduction rates predict an increased mutation rate and, therefore, production of genetic diversity. The team predicts that “higher clock rates are an indicator of epidemic disease, even in the absence of historical evidence” (Cui et al, 2013). It is unclear how an epidemic can be differentiated from an epizootic by genetics alone. We know from modern observations that not all epizootics spill over into the human population. Yet, major polytomys can at least be used to estimate how many bursts of growth the bacterium has gone through in China. We should see other polytomys with increased sequencing of other Central Asian regions.
While these polytomys show a starburst of new lineages, there is also a loss of diversity during every epidemic. Most of the new lineages produced during an epidemic (or epizootic) will die out (become extinct) when the epidemic ends. If the changes are truly neutral, then which lineage survives to endure in the reservoir will be completely random (as will be the number of surviving lineages). We should also remember that clinical isolates during an epidemic and ancient DNA can preserve lineages that become extinct (and this is normal). In the four individuals they sequenced from 14th century East Smithfield, they found two different clones, with the second being derivative of the first. Both of these clones may only be found in ancient DNA, not in any living specimen. The more time that passes the greater the likelihood that the minor lineages will become extinct. This tends to make the earlier sections of the pylogenetic tree look cleaner by stripping off side branches.
Another recent study by Vogler et al (2013), supports their scenario on a finer scale during the 9 year epidemic in a port town of Mahajana, Madagascar from 1991 to 1999. Over a decade we can compare the incidence of plague vs. the genetic diversity. Yersinia pestis evolution can be plotted with great precision. In the lower diagram, clones are color coded to the year of isolation. From 1995 to 1999 it is possible to see the next year’s primary clone emerge in the previous year’s epidemic, which supports local cycling within the city. At the same time, most of the diversity generated is not represented in later outbreaks.
The hosts of these 107 strains give us a glimpse into the host diversity for Yersinia pestis within China (Cui et al, 2013). The figure to the right gives an indication of strain diversity within each host but does not tell us abundance or location within China. What jumps out at me, is that humans and marmots have the most strain diversity. The high strain diversity in humans including 0.PE7, the strain closest to the most recent common ancestor, suggests to the Chinese team that Yersinia pestis has been pathogenic to humans since it evolved (Cui et al, 2013). Thus, at no point in its evolution did it gain the ability to infect humans. The few strains that can not infect humans are hypothesized to have lost their ability to infect humans possibly as a function of purifying selection for voles as hosts. It is interesting that the 1.ORI strains of the third pandemic are only found in humans, rats and mice. We have to be careful about taking this figure to represent abundance or importance of a particular host. The great gerbil, Rhombomys opimus, is a primary host throughout central Asia is is represented by only one strain in this figure.
Studies published this winter have moved us significantly down the road to fleshing out Yersinia pestis. The genetic survey of Y. pestis in China provides a firm foundation to build on as more ancient DNA becomes available and extensive sequencing is done in other regions. Madagascar continues to be the best laboratory for plague ecology and epidemiology, while the Georgian study begins to address unintended intra-laboratory evolution that may shed light on Y. pestis in the wild. I’ll return to these papers again soon as I continue to examine Y. pestis from different perspectives and ruminate on answers to other questions.
Cui, Y., Yu, C., Yan, Y., Li, D., Li, Y., Jombart, T., Weinert, L., Wang, Z., Guo, Z., Xu, L., Zhang, Y., Zheng, H., Qin, N., Xiao, X., Wu, M., Wang, X., Zhou, D., Qi, Z., Du, Z., Wu, H., Yang, X., Cao, H., Wang, H., Wang, J., Yao, S., Rakin, A., Li, Y., Falush, D., Balloux, F., Achtman, M., Song, Y., Wang, J., & Yang, R. (2013). Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis Proceedings of the National Academy of Sciences, 110 (2), 577-582 DOI: 10.1073/pnas.1205750110
Morelli G, Song Y, Mazzoni CJ, Eppinger M, Roumagnac P, Wagner DM, Feldkamp M, Kusecek B, Vogler AJ, Li Y, Cui Y, Thomson NR, Jombart T, Leblois R, Lichtner P, Rahalison L, Petersen JM, Balloux F, Keim P, Wirth T, Ravel J, Yang R, Carniel E, & Achtman M (2010). Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity. Nature genetics, 42 (12), 1140-3 PMID: 21037571
Rajanna C, Ouellette G, Rashid M, Zemla A, Karavis M, Zhou C, Revazishvili T, Redmond B, McNew L, Bakanidze L, Imnadze P, Rivers B, Skowronski EW, O’Connell KP, Sulakvelidze A, & Gibbons HS (2013). A Strain of Yersinia pestis With a Mutator Phenotype from the Republic of Georgia. FEMS microbiology letters PMID: 23521061
Vogler, A., Chan, F., Nottingham, R., Andersen, G., Drees, K., Beckstrom-Sternberg, S., Wagner, D., Chanteau, S., & Keim, P. (2013). A Decade of Plague in Mahajanga, Madagascar: Insights into the Global Maritime Spread of Pandemic Plague mBio, 4 (1) DOI: 10.1128/mBio.00623-12
Nice post, enjoyed it! The mutation rate discussion is really interesting. If I have to be a bit picky, when when you say: “its genome is more elastic than any eukaryote (everything but bacteria)” you have rather scandalously missed out the archaea which form the third domain of life along with eukarya and bacteria!
“any eukaryote (everything but bacteria)”, don’t forget our Archaea friends.
Ok, ok … guilty as charged. I should have written everything but prokaryotes, but then I would have had to explain prokaryote.