The latent process decomposition of cDNA microarray data sets

Rogers, S., Girolami, M., Campbell, C. and Breitling, R. (2005) The latent process decomposition of cDNA microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(2), pp. 143-156. (doi:10.1109/TCBB.2005.29)

Rogers, S., Girolami, M., Campbell, C. and Breitling, R. (2005) The latent process decomposition of cDNA microarray data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(2), pp. 143-156. (doi:10.1109/TCBB.2005.29)

[img]
Preview
Text
cDNA.pdf

1MB

Publisher's URL: http://dx.doi.org/10.1109/TCBB.2005.29

Abstract

We present a new computational technique (a software implementation, data sets, and supplementary information are available at http://www.enm.bris.ac.uk/lpd/) which enables the probabilistic analysis of cDNA microarray data and we demonstrate its effectiveness in identifying features of biomedical importance. A hierarchical Bayesian model, called latent process decomposition (LPD), is introduced in which each sample in the data set is represented as a combinatorial mixture over a finite set of latent processes, which are expected to correspond to biological processes. Parameters in the model are estimated using efficient variational methods. This type of probabilistic model is most appropriate for the interpretation of measurement data generated by cDNA microarray technology. For determining informative substructure in such data sets, the proposed model has several important advantages over the standard use of dendrograms. First, the ability to objectively assess the optimal number of sample clusters. Second, the ability to represent samples and gene expression levels using a common set of latent variables (dendrograms cluster samples and gene expression values separately which amounts to two distinct reduced space representations). Third, in contrast to standard cluster models, observations are not assigned to a single cluster and, thus, for example, gene expression levels are modeled via combinations of the latent processes identified by the algorithm. We show this new method compares favorably with alternative cluster analysis methods. To illustrate its potential, we apply the proposed technique to several microarray data sets for cancer. For these data sets it successfully decomposes the data into known subtypes and indicates possible further taxonomic subdivision in addition to highlighting, in a wholly unsupervised manner, the importance of certain genes which are known to be medically significant. To illustrate its wider applicability, we also illustrate its performance on a microarray data set for yeast.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Breitling, Professor Rainer and Rogers, Dr Simon and Girolami, Prof Mark
Authors: Rogers, S., Girolami, M., Campbell, C., and Breitling, R.
Subjects:Q Science > QA Mathematics > QA76 Computer software
College/School:College of Medical Veterinary and Life Sciences > Institute of Molecular Cell and Systems Biology
College of Science and Engineering > School of Computing Science
Journal Name:IEEE/ACM Transactions on Computational Biology and Bioinformatics
Publisher:Institute of Electrical and Electronics Engineers
ISSN:1545-5963
Copyright Holders:Copyright © 2005 Institute of Electrical and Electronics Engineers
First Published:First published in IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(2):143-156
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record