Latent class analysis variable selection

Dean, N. and Raftery, A. E. (2010) Latent class analysis variable selection. Annals of the Institute of Statistical Mathematics, 62(1), pp. 11-35. (doi: 10.1007/s10463-009-0258-9)

[img] Text
ID34593.pdf

215kB

Publisher's URL: http://dx.doi.org/10.1007/s10463-009-0258-9

Abstract

We propose a method for selecting variables in latent class analysis, which is the most common model-based clustering method for discrete data. The method assesses a variable's usefulness for clustering by comparing two models, given the clustering variables already selected. In one model the variable contributes information about cluster allocation beyond that contained in the already selected variables, and in the other model it does not. A headlong search algorithm is used to explore the model space and select clustering variables. In simulated datasets we found that the method selected the correct clustering variables, and also led to improvements in classification performance and in accuracy of the choice of the number of classes. In two real datasets, our method discovered the same group structure with fewer variables. In a dataset from the International HapMap Project consisting of 639 single nucleotide polymorphisms (SNPs) from 210 members of different groups, our method discovered the same group structure with a much smaller number of SNPs

Item Type:Articles
Keywords:ALLOCATION Bayes factor BIC Categorical data Classification Feature selection IMPROVEMENTS MODEL Model-based clustering MODELS SELECTION Single nucleotide polymorphism (SNP)
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Dean, Dr Nema
Authors: Dean, N., and Raftery, A. E.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Journal Name:Annals of the Institute of Statistical Mathematics
Publisher:Springer Verlag
ISSN:0020-3157
Copyright Holders:Copyright © 2009, The Institute of Statistical Mathematics, Tokyo
First Published:First published in Annals of the Institute of Statistical Mathematics 62(1):11-35
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher. The original publication is available at www.springerlink.com

University Staff: Request a correction | Enlighten Editors: Update this record