Using set theory to reduce redundancy in pathway sets

Stoney, R. A., Schwartz, J.-M., Robertson, D. L. and Nenadic, G. (2018) Using set theory to reduce redundancy in pathway sets. BMC Bioinformatics, 19, 386. (doi: 10.1186/s12859-018-2355-3) (PMID:30340461) (PMCID:PMC6194563)

203715.pdf - Published Version
Available under License Creative Commons Attribution.



Background: The consolidation of pathway databases, such as KEGG, Reactome and ConsensusPathDB, has generated widespread biological interest, however the issue of pathway redundancy impedes the use of these consolidated datasets. Attempts to reduce this redundancy have focused on visualizing pathway overlap or merging pathways, but the resulting pathways may be of heterogeneous sizes and cover multiple biological functions. Efforts have also been made to deal with redundancy in pathway data by consolidating enriched pathways into a number of clusters or concepts. We present an alternative approach, which generates pathway subsets capable of covering all of genes presented within either pathway databases or enrichment results, generating substantial reductions in redundancy. Results: We propose a method that uses set cover to reduce pathway redundancy, without merging pathways. The proposed approach considers three objectives: removal of pathway redundancy, controlling pathway size and coverage of the gene set. By applying set cover to the ConsensusPathDB dataset we were able to produce a reduced set of pathways, representing 100% of the genes in the original data set with 74% less redundancy, or 95% of the genes with 88% less redundancy. We also developed an algorithm to simplify enrichment data and applied it to a set of enriched osteoarthritis pathways, revealing that within the top ten pathways, five were redundant subsets of more enriched pathways. Applying set cover to the enrichment results removed these redundant pathways allowing more informative pathways to take their place. Conclusion: Our method provides an alternative approach for handling pathway redundancy, while ensuring that the pathways are of homogeneous size and gene coverage is maximised. Pathways are not altered from their original form, allowing biological knowledge regarding the data set to be directly applicable. We demonstrate the ability of the algorithms to prioritise redundancy reduction, pathway size control or gene set coverage. The application of set cover to pathway enrichment results produces an optimised summary of the pathways that best represent the differentially regulated gene set.

Item Type:Articles
Additional Information:This work has been supported by the Biotechnology and Biological Sciences Research Council DTP [BB/J014478/1].
Glasgow Author(s) Enlighten ID:Robertson, Professor David
Authors: Stoney, R. A., Schwartz, J.-M., Robertson, D. L., and Nenadic, G.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
College of Medical Veterinary and Life Sciences > School of Infection & Immunity > Centre for Virus Research
Journal Name:BMC Bioinformatics
ISSN (Online):1471-2105
Copyright Holders:Copyright © 2018 The Authors
First Published:First published in BMC Bioinformatics 19:386
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record