The utility of data transformation for alignment, de novo assembly and classification of short read virus sequences

Tapinos, A., Constantinides, B., Phan, M. V. T., Kouchaki, S., Cotten, M. and Robertson, D. L. (2019) The utility of data transformation for alignment, de novo assembly and classification of short read virus sequences. Viruses, 11(5), 394. (doi: 10.3390/v11050394) (PMID:31035503) (PMCID:PMC6563281)

[img]
Preview
Text
203701.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.

Item Type:Articles
Additional Information:: This work has been supported by the Welcome Trust [097820/Z/11/B], the BBSRC [BB/H012419/1, BB/M001121/1 and BC by a BBSRC DTP studentship] and the VIROGENESIS project, which receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Robertson, Professor David and Cotten, Professor Matthew
Authors: Tapinos, A., Constantinides, B., Phan, M. V. T., Kouchaki, S., Cotten, M., and Robertson, D. L.
College/School:College of Medical Veterinary and Life Sciences > Institute of Infection Immunity and Inflammation
Journal Name:Viruses
Publisher:MDPI
ISSN:1999-4915
ISSN (Online):1999-4915
Copyright Holders:Copyright © 2019 The Authors
First Published:First published in Viruses 11:394
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record