Infinite factorization of multiple non-parametric views

Rogers, S. , Klami, A., Sinkkonen, J., Girolami, M. and Kaski, S. (2010) Infinite factorization of multiple non-parametric views. Machine Learning, 79(1-2), pp. 201-226. (doi:10.1007/s10994-009-5155-1)

[img] Text


Publisher's URL:


Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale of classical canonical correlation analysis into a flexible, generative and non-parametric clustering setting, by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the views. The commonalities between the sources are modeled by an infinite block model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. Cluster analysis of co-expression is a standard simple way of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Rogers, Dr Simon and Girolami, Prof Mark
Authors: Rogers, S., Klami, A., Sinkkonen, J., Girolami, M., and Kaski, S.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics
H Social Sciences > HA Statistics
College/School:College of Science and Engineering > School of Computing Science
Research Group:Inference
Journal Name:Machine Learning
Published Online:13 November 2009
Copyright Holders:Copyright © 2009 Springer
First Published:First published in Machine Learning 79(1-2):201-226
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher. The original publication is available at

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
399341Stochastic modelling and statistical inference of gene regulatory pathways - integrating multiple sources of dataErnst WitEngineering & Physical Sciences Research Council (EPSRC)EP/C010620/1Statistics