Evaluating the performance of tools used to call minority variants from whole genome short-read data

Said Mohammed, K., Kibinge, N., Prins, P., Agoti, C. N., Cotten, M. , Nokes, D.J., Brand, S. and Githinji, G. (2018) Evaluating the performance of tools used to call minority variants from whole genome short-read data. Wellcome Open Research, 3, 21. (doi: 10.12688/wellcomeopenres.13538.2) (PMID:30483597) (PMCID:PMC6234735)

[img]
Preview
Text
195119.pdf - Published Version
Available under License Creative Commons Attribution.

2MB

Abstract

Background: High-throughput whole genome sequencing facilitates investigation of minority virus sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially assist in elucidating person-person transmission pathways. Several minority variant callers have been developed to describe low frequency sub-populations from whole genome sequence data. These callers differ based on bioinformatics and statistical methods used to discriminate sequencing errors from low-frequency variants. Methods: We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data from virus samples. We used the ART-Illumina read simulation tool to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance. Results: The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified the majority of variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller. Conclusions: We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impacts on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.

Item Type:Articles
Additional Information:Version 2; peer review: 2 approved. The work was funded by the Wellcome Trust Senior Investigator Award to Prof D. James Nokes [102975] in addition, this work was supported through the DELTAS Africa Initiative [DEL-15-003]. The DELTAS Africa Initiative is an independent funding scheme of the African Academy of Sciences (AAS)'s Alliance for Accelerating Excellence in Science in Africa (AESA) and supported by the New Partnership for Africa's Development Planning and Coordinating Agency (NEPAD Agency) with funding from the Wellcome Trust [107769] and the UK government. The views expressed in this publication are those of the author(s) and not necessarily those of AAS, NEPAD Agency, Wellcome Trust or the UK government.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Cotten, Professor Matthew
Creator Roles:
Cotten, M.Investigation, Writing – review and editing
Authors: Said Mohammed, K., Kibinge, N., Prins, P., Agoti, C. N., Cotten, M., Nokes, D.J., Brand, S., and Githinji, G.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Medical Veterinary and Life Sciences > School of Infection & Immunity > Centre for Virus Research
Journal Name:Wellcome Open Research
Publisher:F1000Research
ISSN:2398-502X
ISSN (Online):2398-502X
Published Online:05 March 2018
Copyright Holders:Copyright © 2018 Said Mohammed K et al.
First Published:First published in Wellcome Open Research 3: 21
Publisher Policy:Reproduced under a Creative Commons License
Data DOI:10.7910/DVN/ZIO43M

University Staff: Request a correction | Enlighten Editors: Update this record