Enlighten Publications

In this section

Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data

Schirmer, M., D’Amore, R., Ijaz, U. Z. , Hall, N. and Quince, C. (2016) Illumina error profiles: resolving fine-scale variation in metagenomic sequencing data. BMC Bioinformatics, 17, 125. (doi: 10.1186/s12859-016-0976-y) (PMID:26968756) (PMCID:PMC4787001)

Preview

Text
117498.pdf - Published Version
Available under License Creative Commons Attribution.
1MB

Abstract

Background Illumina’s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development. Results We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of in vitro metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing in vitro metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in “GG”. On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains. Conclusion Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation.

Item Type:	Articles
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Ijaz, Dr Umer
Authors:	Schirmer, M., D’Amore, R., Ijaz, U. Z., Hall, N., and Quince, C.
College/School:	College of Science and Engineering > School of Engineering > Infrastructure and Environment
Journal Name:	BMC Bioinformatics
Publisher:	Biomed Central
ISSN:	1471-2105
ISSN (Online):	1471-2105
Copyright Holders:	Copyright © 2016 2016 Schirmer et al.
First Published:	First published in BMC Bioinformatics 17:125
Publisher Policy:	Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Funder and Project Information

Project Code	Award No	Project Name	Principal Investigator	Funder's Name	Funder Ref	Lead Dept
50335	1	Pioneering the genomics era of environmental microbiology	Christopher Quince	Engineering & Physical Sciences Research Council (EPSRC)	EP/H003851/1	ENG - ENGINEERING INFRASTRUCTURE & ENVIR
65277	1	Understanding microbial community through in situ environmental 'omic data synthesis	Umer Ijaz	Natural Environment Research Council (NERC)	NE/L011956/1	ENG - ENGINEERING INFRASTRUCTURE & ENVIR

Deposit and Record Details

ID Code:	117498
Depositing User:	Dr Umer Ijaz
Datestamp:	30 Mar 2016 10:46
Last Modified:	20 Dec 2018 11:32
Date of acceptance:	2 March 2016
Date of first online publication:	March 2016
Date Deposited:	30 March 2016
Data Availability Statement:	Yes