Flynt, A. and Dean, N. (2016) A survey of popular R packages for cluster analysis. Journal of Educational and Behavioral Statistics, 41(2), pp. 205-225. (doi: 10.3102/1076998616631743)
|
Text
153580.pdf - Accepted Version 528kB |
Abstract
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.R
Item Type: | Articles |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Dean, Dr Nema |
Authors: | Flynt, A., and Dean, N. |
College/School: | College of Science and Engineering > School of Mathematics and Statistics > Statistics |
Journal Name: | Journal of Educational and Behavioral Statistics |
Publisher: | SAGE Publications |
ISSN: | 1076-9986 |
ISSN (Online): | 1935-1054 |
Published Online: | 01 April 2016 |
Copyright Holders: | Copyright © 2016 AERA |
First Published: | First published in Journal of Educational and Behavioral Statistics 41(2): 205-225 |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
University Staff: Request a correction | Enlighten Editors: Update this record