Variable selection in regression mixture modeling for the discovery of gene regulatory networks

Gupta, M. and Ibrahim, J.G. (2007) Variable selection in regression mixture modeling for the discovery of gene regulatory networks. Journal of the American Statistical Association, 102(479), pp. 867-880. (doi: 10.1198/016214507000000068)

Full text not currently available from Enlighten.

Abstract

The profusion of genomic data through genome sequencing and gene expression microarray technology has facilitated statistical research in determining gene interactions regulating a biological process. Current methods generally consist of a two-stage procedure: clustering gene expression measurements and searching for regulatory “switches", typically short, conserved sequence patterns (motifs) in the DNA sequence adjacent to the genes. This process often leads to misleading conclusions as incorrect cluster selection may lead to missing important regulatory motifs or making many false discoveries. Treating cluster memberships as known, rather than estimated, introduces bias into analyses, preventing uncertainty about cluster parameters. Further, there is underutilization of the available data, as the sequence information is ignored for purposes of expression clustering and vice versa. We propose a way to address these issues by combining gene clustering and motif discovery in a unified framework, a mixture of hierarchical regression models, with unknown components representing the latent gene clusters, and genomic sequence features linked to the resultant gene expression through a multivariate hierarchical regression. We demonstrate a Monte Carlo method for simultaneous variable selection (for motifs) and clustering (for genes). The selection of the number of components in the mixture is addressed by computing the analytically intractable Bayes factor through a novel multistage mixture importance sampling approach. This methodology is used to analyze a yeast cell cycle dataset to determine an optimal set of motifs that discriminates between groups of genes and simultaneously finds the most significant gene clusters.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Gupta, Professor Mayetri
Authors: Gupta, M., and Ibrahim, J.G.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Journal Name:Journal of the American Statistical Association
ISSN:0162-1459

University Staff: Request a correction | Enlighten Editors: Update this record