A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns

Kouchaki, S., Tapinos, A. and Robertson, D. (2019) A signal processing method for alignment-free metagenomic binning: multi-resolution genomic binary patterns. Scientific Reports, 9, 2159. (doi: 10.1038/s41598-018-38197-9) (PMID:30770850) (PMCID:PMC6377666)

[img]
Preview
Text
203697.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local ‘texture’ changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their ‘texture’ compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP.

Item Type:Articles
Additional Information:S.K. and A.T. were supported by the VIROGENESIS project. Te VIROGENESIS project receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 634650. AT was also supported by BBSRC project grant, BB/M001121/1. We would like to thank Bede Constantinides for help with metagenomics data analysis.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Robertson, Professor David
Authors: Kouchaki, S., Tapinos, A., and Robertson, D.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Medical Veterinary and Life Sciences > School of Infection & Immunity > Centre for Virus Research
Journal Name:Scientific Reports
Publisher:Nature Research
ISSN:2045-2322
ISSN (Online):2045-2322
Copyright Holders:Copyright © 2019 The Authors
First Published:First published in Scientific Reports 9:2159
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record