NCBI’s virus discovery codeathon: building “FIVE” —the Federated Index of Viral Experiments API index

Martí-Carreras, J. et al. (2020) NCBI’s virus discovery codeathon: building “FIVE” —the Federated Index of Viral Experiments API index. Viruses, 12(12), 1424. (doi: 10.3390/v12121424) (PMID:33322070) (PMCID:PMC7764237)

[img] Text
230681.pdf - Published Version
Available under License Creative Commons Attribution.

2MB

Abstract

Viruses represent important test cases for data federation due to their genome size and the rapid increase in sequence data in publicly available databases. However, some consequences of previously decentralized (unfederated) data are lack of consensus or comparisons between feature annotations. Unifying or displaying alternative annotations should be a priority both for communities with robust entry representation and for nascent communities with burgeoning data sources. To this end, during this three-day continuation of the Virus Hunting Toolkit codeathon series (VHT-2), a new integrated and federated viral index was elaborated. This Federated Index of Viral Experiments (FIVE) integrates pre-existing and novel functional and taxonomy annotations and virus–host pairings. Variability in the context of viral genomic diversity is often overlooked in virus databases. As a proof-of-concept, FIVE was the first attempt to include viral genome variation for HIV, the most well-studied human pathogen, through viral genome diversity graphs. As per the publication of this manuscript, FIVE is the first implementation of a virus-specific federated index of such scope. FIVE is coded in BigQuery for optimal access of large quantities of data and is publicly accessible. Many projects of database or index federation fail to provide easier alternatives to access or query information. To this end, a Python API query system was developed to enhance the accessibility of FIVE.

Item Type:Articles
Keywords:Data federation, CRISPR, protein domain, metagenomics, virus, genome graphs, HIV-1.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Modha, Ms Sejal
Creator Roles:
Modha, S.Software, Validation, Formal analysis, Data curation, Writing – original draft, Writing – review and editing, Visualization
Authors: Martí-Carreras, J., Gener, A. R., Miller, S. D., Brito, A. F., Camacho, C. E., Connor, R., Deboutte, W., Glickman, C., Kristensen, D. M., Meyer, W. K., Modha, S., Norris, A. L., Saha, S., Belford, A. K., Biederstedt, E., Brister, J. R., Buchmann, J. P., Cooley, N. P., Edwards, R. A., Javkar, K., Muchow, M., Muralidharan, H. S., Pepe-Ranney, C., Shah, N., Shakya, M., Tisza, M. J., Tully, B. J., Vanmechelen, B., Virta, V. C., Weissman, J. L., Zalunin, V., Efremov, A., and Busby, B.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
Journal Name:Viruses
Publisher:MDPI
ISSN:1999-4915
ISSN (Online):1999-4915
Published Online:10 December 2020
Copyright Holders:Copyright © 2020 The Authors
First Published:First published in Viruses 12(12): 1424
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record