Discovery of conserved sequence patterns using a stochastic dictionary model

Gupta, M. and Liu, J.S. (2003) Discovery of conserved sequence patterns using a stochastic dictionary model. Journal of the American Statistical Association, 98(461), pp. 55-66. (doi: 10.1198/016214503388619094)

Full text not currently available from Enlighten.

Abstract

Detection of unknown patterns from a randomly generated sequence of observations is a problem arising in fields ranging from signal processing to computational biology. Here we focus on the discovery of short recurring patterns (called motifs) in DNA sequences that represent binding sites for certain proteins in the process of gene regulation. What makes this a difficult problem is that these patterns can vary stochastically. We describe a novel data augmentation strategy for detecting such patterns in biological sequences based on an extension of a “dictionary” model. In this approach, we treat conserved patterns and individual nucleotides as stochastic words generated according to probability weight matrices and the observed sequences generated by concatenations of these words. By using a missingdata approach to find these patterns, we also address other related problems, including determining widths of patterns, finding multiple motifs, handling low-complexity regions, and finding patterns with insertions and deletions. The issue of selecting appropriate models is also discussed. However, the flexibility of this model is also accompanied by a high degree of computational complexity. We demonstrate how dynamic programming-like recursions can be used to improve computational efficiency.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Gupta, Professor Mayetri
Authors: Gupta, M., and Liu, J.S.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Journal Name:Journal of the American Statistical Association
ISSN:0162-1459
ISSN (Online):1537-274X
Published Online:31 December 2011

University Staff: Request a correction | Enlighten Editors: Update this record