Dean, N. and Raftery, A. E. (2010) Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics, 62(1), pp. 11-35. (doi: 10.1007/s10463-009-0258-9)
Text
ID34593.pdf 215kB |
Publisher's URL: http://dx.doi.org/10.1007/s10463-009-0258-9
Abstract
We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs
Item Type: | Articles |
---|---|
Keywords: | ALLOCATION Bayes factor BIC Categorical data Classification Feature selection IMPROVEMENTS MODEL Model-based clustering MODELS SELECTION Single nucleotide polymorphism (SNP) |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Dean, Dr Nema |
Authors: | Dean, N., and Raftery, A. E. |
College/School: | College of Science and Engineering > School of Mathematics and Statistics > Statistics |
Journal Name: | Annals of the Institute of Statistical Mathematics |
Publisher: | Springer Verlag |
ISSN: | 0020-3157 |
Copyright Holders: | Copyright © 2009, The Institute of Statistical Mathematics, Tokyo |
First Published: | First published in Annals of the Institute of Statistical Mathematics 62(1):11-35 |
Publisher Policy: | Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com |
University Staff: Request a correction | Enlighten Editors: Update this record