Identifying and prioritizing potential human-infecting viruses from their genome sequences

Mollentze, N. , Babayan, S. A. and Streicker, D. G. (2021) Identifying and prioritizing potential human-infecting viruses from their genome sequences. PLoS Biology, 19(9), e3001390. (doi: 10.1371/journal.pbio.3001390) (PMID:34582436) (PMCID:PMC8478193)

[img] Text
249630.pdf - Published Version
Available under License Creative Commons Attribution.



Determining which animal viruses may be capable of infecting humans is currently intractable at the time of their discovery, precluding prioritization of high-risk viruses for early investigation and outbreak preparedness. Given the increasing use of genomics in virus discovery and the otherwise sparse knowledge of the biology of newly discovered viruses, we developed machine learning models that identify candidate zoonoses solely using signatures of host range encoded in viral genomes. Within a dataset of 861 viral species with known zoonotic status, our approach outperformed models based on the phylogenetic relatedness of viruses to known human-infecting viruses (area under the receiver operating characteristic curve [AUC] = 0.773), distinguishing high-risk viruses within families that contain a minority of human-infecting species and identifying putatively undetected or so far unrealized zoonoses. Analyses of the underpinnings of model predictions suggested the existence of generalizable features of viral genomes that are independent of virus taxonomic relationships and that may preadapt viruses to infect humans. Our model reduced a second set of 645 animal-associated viruses that were excluded from training to 272 high and 41 very high-risk candidate zoonoses and showed significantly elevated predicted zoonotic risk in viruses from nonhuman primates, but not other mammalian or avian host groups. A second application showed that our models could have identified Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) as a relatively high-risk coronavirus strain and that this prediction required no prior knowledge of zoonotic Severe Acute Respiratory Syndrome (SARS)-related coronaviruses. Genome-based zoonotic risk assessment provides a rapid, low-cost approach to enable evidence-driven virus surveillance and increases the feasibility of downstream biological and ecological characterization of viruses.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Babayan, Dr Simon and Streicker, Professor Daniel and Mollentze, Dr Nardus
Creator Roles:
Streicker, D. G.Conceptualization, Data curation, Writing – review and editing
Babayan, S. A.Conceptualization, Writing – review and editing
Mollentze, N.Data curation, Formal analysis, Visualization, Writing – original draft, Writing – review and editing
Authors: Mollentze, N., Babayan, S. A., and Streicker, D. G.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Medical Veterinary and Life Sciences > School of Infection & Immunity > Centre for Virus Research
College of Medical Veterinary and Life Sciences > School of Biodiversity, One Health & Veterinary Medicine
Journal Name:PLoS Biology
Publisher:Public Library of Science
ISSN (Online):1545-7885
Published Online:28 September 2021
Copyright Holders:Copyright © 2021 Mollentze et al.
First Published:First published in PLoS Biology 19(9): e3001390
Publisher Policy:Reproduced under a Creative Commons License
Related URLs:
Data DOI:10.5281/zenodo.4271479

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
307106Epidemiology meets biotechnology: preventing viral emergence from batsDaniel StreickerWellcome Trust (WELLCOTR)217221/Z/19/ZInstitute of Biodiversity, Animal Health and Comparative Medicine
656551Arbovirus interactions with arthropod hostsAlain KohlMedical Research Council (MRC)MC_UU_12014/8MVLS III - CENTRE FOR VIRUS RESEARCH
Viral Genomics and BioinformaticsAndrew DavisonMedical Research Council (MRC)MC_UU_12014/12III-MRC-GU Centre for Virus Research