A survey of popular R packages for cluster analysis

Flynt, A. and Dean, N. (2016) A survey of popular R packages for cluster analysis. Journal of Educational and Behavioral Statistics, 41(2), pp. 205-225. (doi: 10.3102/1076998616631743)

153580.pdf - Accepted Version



Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.R

Item Type:Articles
Glasgow Author(s) Enlighten ID:Dean, Dr Nema
Authors: Flynt, A., and Dean, N.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Journal Name:Journal of Educational and Behavioral Statistics
Publisher:SAGE Publications
ISSN (Online):1935-1054
Published Online:01 April 2016
Copyright Holders:Copyright © 2016 AERA
First Published:First published in Journal of Educational and Behavioral Statistics 41(2): 205-225
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record