Which animal did the novel coronavirus come from? | Live Science PubMed Genetics 172, 26652681 (2006). As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. Intraspecies diversity of SARS-like coronaviruses in Rhinolophus sinicus and its implications for the origin of SARS coronaviruses in humans. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. J. Virol. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. We thank all authors who have kindly deposited and shared genome data on GISAID. All custom code used in the manuscript is available at https://github.com/plemey/SARSCoV2origins. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. M.F.B., P.L. The difficulty in inferring reliable evolutionary histories for coronaviruses is that their high recombination rate48,49 violates the assumption of standard phylogenetic approaches because different parts of the genome have different histories. T.T.-Y.L. N. Engl. Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. Wu, Y. et al. The shaded region corresponds to the Sprotein. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage - Nature However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. Individual sequences such as RpShaanxi2011, Guangxi GX2013 and two sequences from Zhejiang Province (CoVZXC21/CoVZC45), as previously shown22,25, have strong phylogenetic recombination signals because they fall on different evolutionary lineages (with bootstrap support >80%) depending on what region of the genome is being examined. We thank originating laboratories at South China Agricultural University (Y. Shen, L. Xiao and W. Chen; no. Thank you for visiting nature.com. Med. 84, 31343146 (2010). Lancet 395, 949950 (2020). 68, 10521061 (2019). 36, 7597 (2002). Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. The research leading to these results received funding (to A.R. Among the 68sequences in the aligned sarbecovirus sequence set, 67 show evidence of mosaicism (all DunnSidak-corrected P<4104 and 3SEQ14), indicating involvement in homologous recombination either directly with identifiable parentals or in their deeper shared evolutionary historythat is, due to shared ancestral recombination events. PubMed 2 Lack of root-to-tip temporal signal in SARS-CoV-2. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. 382, 11991207 (2020). Bioinformatics 30, 13121313 (2014). performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. A third approach attempted to minimize the number of regions removed while also minimizing signals of mosaicism and homoplasy. 91, 10581062 (2010). obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). Due to the absence of temporal signal in the sarbecovirus datasets, we used informative prior distributions on the evolutionary rate to estimate divergence dates. J. Gen. Virol. Yres, D. L. et al. Google Scholar. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. Region A has been shortened to A (5,017nt) based on potential recombination signals within the region. Meet the people who warn the world about new covid variants 36)gives a putative recombination-free alignment that we call non-recombinant alignment3 (NRA3) (see Methods). Sequence similarity. EPI_ISL_410721) and Beijing Institute of Microbiology and Epidemiology (W.-C. Cao, T.T.-Y.L., N. Jia, Y.-W. Zhang, J.-F. Jiang and B.-G. Jiang, nos. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). The time-calibrated phylogeny represents a maximum clade credibility tree inferred for NRR1. Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. Below, we report divergence time estimates based on the HCoV-OC43-centred rate prior for NRR1, NRR2 and NRA3 and summarize corresponding estimates for the MERS-CoV-centred rate priors in Extended Data Fig. Impact of SARS-CoV-2 Gamma lineage introduction and COVID-19 - Nature Indeed, the rates reported by these studies are in line with the short-term SARS rates that we estimate (Fig. Concatenated region ABC is NRR1. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. a, Breakpoints identified by 3SEQ illustrated by percentage of sequences (out of 68) that support a particular breakpoint position. We compiled a set of 69SARS-CoV genomes including 58 sampled from humans and 11 sampled from civets and raccoon dogs. 5. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. Even before the COVID-19 pandemic, pangolins have been making headlines. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. and T.A.C. Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. Calibration of priors can be performed using other coronaviruses (SARS-CoV, MERS-CoV and HCoV-OC43), but estimated rates vary with the timescale of sample collection. The assumption of long-term purifying selection would imply that coronaviruses are in endemic equilibrium with their natural host species, horseshoe bats, to which they are presumably well adapted. Dis. In the variable-loop region, RaTG13 diverges considerably with the TMRCA, now outside that of SARS-CoV-2 and the Pangolin Guangdong 2019 ancestor, suggesting that RaTG13 has acquired this region from a more divergent and undetected bat lineage. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. SARS-CoV-2 Variant Classifications and Definitions When the genomic data included both coding and non-coding regions we used a single GTR+ substitution model; for concatenated coding genes we partitioned the alignment by codon position and specified an independent GTR+ model for each partition with a separate gamma model to accommodate inter-site rate variation. The boxplots show divergence time estimates (posterior medians) for SARS-CoV-2 (red) and the 20022003 SARS-CoV virus (blue) from their most closely related bat virus. Extended Data Fig. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. Coronavirus: Pangolins found to carry related strains. All authors contributed to analyses and interpretations. Results and discussion Genomic surveillance has been a hallmark of the COVID-19 pandemic that, in contrast to other pandemics, achieves tracking of the virus evolution and spread worldwide almost in real-time ( 4 ). Wang, L. et al. 92, 433440 (2020). . We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. This is not surprising for diverse viral populations with relatively deep evolutionary histories. The coverage threshold and consensus sequence generation threshold were set to 20 and 90 respectively. A new SARS-CoV-2 variant (B.1.1.523) capable of escaping immune protections The 2009 influenza pandemic and subsequent outbreaks of MERS-CoV (2012), H7N9 avian influenza (2013), Ebola virus (2014) and Zika virus (2015) were met with rapid sequencing and genomic characterization. Emerg. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. Lancet 383, 541548 (2013). Sorting these breakpoint-free regions (BFRs) by length results in two segments >5kb: an ORF1a subregion spanning nucleotides (nt) 3,6259,150 and the first half of ORF1b spanning nt13,29119,628 (sequence numbering given in Source Data, https://github.com/plemey/SARSCoV2origins). Trova, S. et al. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins We demonstrate that the sarbecoviruses circulating in horseshoe bats have complex recombination histories as reported by others15,20,21,22,23,24,25,26. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. Split diversity in constrained conservation prioritization using integer linear programming. Natl Acad. To estimate non-synonymous over synonymous rate ratios for the concatenated coding genes, we used the empirical Bayes Renaissance countingprocedure67. 4 we compare these divergence time estimates to those obtained using the MERS-CoV-centred rate priors for NRR1, NRR2 and NRA3. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). Syst. Extended Data Fig. One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. Nature 538, 193200 (2016). To obtain For the HCoV-OC43, MERS-CoV and SARS datasets we specified flexible skygrid coalescent tree priors. Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 # Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. RegionB showed no PI signals within the region, except one including sequence SC2018 (Sichuan), and thus this sequence was also removed from the set. Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. Its origin and direct ancestral viruses have not been . B.W.P. CoV-lineages GitHub In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10.