Hunt, M., Newbold, C., Berriman, M. and Otto, T. D. (2014) A comprehensive evaluation of assembly scaffolding tools. Genome Biology, 15(3), R42. (doi: 10.1186/gb-2014-15-3-r42) (PMID:24581555) (PMCID:PMC4053845)
|
Text
148012.pdf - Published Version Available under License Creative Commons Attribution. 564kB |
Abstract
Background: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results: Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions: The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity.
Item Type: | Articles |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Otto, Professor Thomas |
Authors: | Hunt, M., Newbold, C., Berriman, M., and Otto, T. D. |
College/School: | College of Medical Veterinary and Life Sciences > School of Infection & Immunity |
Journal Name: | Genome Biology |
Publisher: | BioMed Central |
ISSN: | 1474-760X |
ISSN (Online): | 1465-6906 |
Published Online: | 28 August 2013 |
Copyright Holders: | Copyright © 2014 The Authors |
First Published: | First published in Genome Biology 15(3):R42 |
Publisher Policy: | Reproduced under a Creative Commons License |
University Staff: Request a correction | Enlighten Editors: Update this record