Predicting a diagnosis of ankylosing spondylitis using primary care health records–a machine learning approach

Kennedy, J., Kennedy, N., Cooksey, R., Choy, E., Siebert, S. , Rahman, M. and Brophy, S. (2023) Predicting a diagnosis of ankylosing spondylitis using primary care health records–a machine learning approach. PLoS ONE, 18(3), e0279076. (doi: 10.1371/journal.pone.0279076) (PMID:37000839) (PMCID:PMC10065228)

[img] Text
295607.pdf - Published Version
Available under License Creative Commons Attribution.

1MB

Abstract

Ankylosing spondylitis is the second most common cause of inflammatory arthritis. However, a successful diagnosis can take a decade to confirm from symptom onset (via x-rays). The aim of this study was to use machine learning methods to develop a profile of the characteristics of people who are likely to be given a diagnosis of AS in future. The Secure Anonymised Information Linkage databank was used. Patients with ankylosing spondylitis were identified using their routine data and matched with controls who had no record of a diagnosis of ankylosing spondylitis or axial spondyloarthritis. Data was analysed separately for men and women. The model was developed using feature/variable selection and principal component analysis to develop decision trees. The decision tree with the highest average F value was selected and validated with a test dataset. The model for men indicated that lower back pain, uveitis, and NSAID use under age 20 is associated with AS development. The model for women showed an older age of symptom presentation compared to men with back pain and multiple pain relief medications. The models showed good prediction (positive predictive value 70%-80%) in test data but in the general population where prevalence is very low (0.09% of the population in this dataset) the positive predictive value would be very low (0.33%-0.25%). Machine learning can be used to help profile and understand the characteristics of people who will develop AS, and in test datasets with artificially high prevalence, will perform well. However, when applied to a general population with low prevalence rates, such as that in primary care, the positive predictive value for even the best model would be 1.4%. Multiple models may be needed to narrow down the population over time to improve the predictive value and therefore reduce the time to diagnosis of ankylosing spondylitis.

Item Type:Articles
Additional Information:This work was supported by UCB Pharma, Health Data Research UK, and the infrastructure support of the National Centre for Population Health and Wellbeing and the SAIL Databank.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Siebert, Professor Stefan
Creator Roles:
Siebert, S.Conceptualization, Validation, Writing – review and editing
Authors: Kennedy, J., Kennedy, N., Cooksey, R., Choy, E., Siebert, S., Rahman, M., and Brophy, S.
College/School:College of Medical Veterinary and Life Sciences > School of Infection & Immunity
Research Centre:College of Medical Veterinary and Life Sciences > School of Infection & Immunity > Centre for Immunobiology
Journal Name:PLoS ONE
Publisher:Public Library of Science
ISSN:1932-6203
ISSN (Online):1932-6203
Copyright Holders:Copyright © 2023 Kennedy et al.
First Published:First published in PLoS ONE 18(3): e0279076
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record