Decision trees in epidemiological research

Venkatasubramaniam, A., Wolfson, J., Mitchell, N., Barnes, T., JaKa, M. and French, S. (2017) Decision trees in epidemiological research. Emerging Themes in Epidemiology, 14, 11. (doi: 10.1186/s12982-017-0064-4) (PMID:28943885) (PMCID:PMC5607590)

[img]
Preview
Text
149514.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Background: In many studies, it is of interest to identify population subgroups that are relatively homogeneous with respect to an outcome. The nature of these subgroups can provide insight into effect mechanisms and suggest targets for tailored interventions. However, identifying relevant subgroups can be challenging with standard statistical methods. Main text: We review the literature on decision trees, a family of techniques for partitioning the population, on the basis of covariates, into distinct subgroups who share similar values of an outcome variable. We compare two decision tree methods, the popular Classification and Regression tree (CART) technique and the newer Conditional Inference tree (CTree) technique, assessing their performance in a simulation study and using data from the Box Lunch Study, a randomized controlled trial of a portion size intervention. Both CART and CTree identify homogeneous population subgroups and offer improved prediction accuracy relative to regression-based approaches when subgroups are truly present in the data. An important distinction between CART and CTree is that the latter uses a formal statistical hypothesis testing framework in building decision trees, which simplifies the process of identifying and interpreting the final tree model. We also introduce a novel way to visualize the subgroups defined by decision trees. Our novel graphical visualization provides a more scientifically meaningful characterization of the subgroups identified by decision trees. Conclusions: Decision trees are a useful tool for identifying homogeneous subgroups defined by combinations of individual characteristics. While all decision tree techniques generate subgroups, we advocate the use of the newer CTree technique due to its simplicity and ease of interpretation.

Item Type:Articles
Additional Information:Data analyzed in this paper are from the Box Lunch Study, which was supported by a grant from NIH/NIDDK R01DK 081714.
Keywords:Decision trees, predictors, subgroup heterogeneity.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Venkatasubramaniam, Ashwini
Authors: Venkatasubramaniam, A., Wolfson, J., Mitchell, N., Barnes, T., JaKa, M., and French, S.
College/School:College of Science and Engineering
Journal Name:Emerging Themes in Epidemiology
Publisher:BioMed Central
ISSN:1742-7622
ISSN (Online):1742-7622
Copyright Holders:Copyright © 2017 The Authors
First Published:First published in Emerging Themes in Epidemiology 14: 11
Publisher Policy:Reproduced under a Creative Commons license

University Staff: Request a correction | Enlighten Editors: Update this record