MCbiclust: A novel algorithm to discover large-scale functionally related gene sets from massive transcriptomics data collections

Bentham, R. B., Bryson, K. and Szabadkai, G. (2017) MCbiclust: A novel algorithm to discover large-scale functionally related gene sets from massive transcriptomics data collections. Nucleic Acids Research, 45(15), pp. 8712-8730. (doi: 10.1093/nar/gkx590) (PMID:28911113) (PMCID:PMC5587796)

[img] Text
280870.pdf - Published Version
Available under License Creative Commons Attribution.

6MB

Abstract

The potential to understand fundamental biological processes from gene expression data has grown in parallel with the recent explosion of the size of data collections. However, to exploit this potential, novel analytical methods are required, capable of discovering large co-regulated gene networks. We found current methods limited in the size of correlated gene sets they could discover within biologically heterogeneous data collections, hampering the identification of multi-gene controlled fundamental cellular processes such as energy metabolism, organelle biogenesis and stress responses. Here we describe a novel biclustering algorithm called Massively Correlated Biclustering (MCbiclust) that selects samples and genes from large datasets with maximal correlated gene expression, allowing regulation of complex networks to be examined. The method has been evaluated using synthetic data and applied to large bacterial and cancer cell datasets. We show that the large biclusters discovered, so far elusive to identification by existing techniques, are biologically relevant and thus MCbiclust has great potential in the analysis of transcriptomics data to identify large-scale unknown effects hidden within the data. The identified massive biclusters can be used to develop improved transcriptomics based diagnosis tools for diseases caused by altered gene expression, or used for further network analysis to understand genotype-phenotype correlations.

Item Type:Articles
Additional Information:University College London COMPLeX/British Heart Foundation Fund [SP/08/004]; Biochemical and Biophysical Research Council [BB/L020874/1]; Wellcome Trust [097815/Z/11/Z], UK; Association for Cancer Research (AIRC) [IG13447], Italy. Funding for open access charge: Wellcome Trust.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Bryson, Dr Kevin
Authors: Bentham, R. B., Bryson, K., and Szabadkai, G.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Nucleic Acids Research
Publisher:Oxford University Press
ISSN:0305-1048
ISSN (Online):1362-4962
Copyright Holders:Copyright: © The Author(s) 2017
First Published:First published in Nucleic Acids Research 45(15): 8712–8730
Publisher Policy:Reproduced under a Creative Commons licence

University Staff: Request a correction | Enlighten Editors: Update this record