Development of a semi-automated database for adult congenital heart disease patients

Verma, S. et al. (2022) Development of a semi-automated database for adult congenital heart disease patients. Canadian Journal of Cardiology, 38(10), pp. 1634-1640. (doi: 10.1016/j.cjca.2022.05.022) (PMID:35661703)

[img] Text
271891.pdf - Accepted Version
Restricted to Repository staff only until 31 May 2023.
Available under License Creative Commons Attribution Non-commercial No Derivatives.

976kB

Abstract

Background: Databases for Congenital Heart Disease (CHD) are effective in delivering accessible datasets ready for statistical inference. Data collection hitherto has however been labour and time intensive and has required substantial financial support to ensure sustainability. We propose here creation and piloting of a semiautomated technique for data extraction from clinic letters to populate a clinical database. Methods: PDF formatted clinic letters stored in a local folder, through a series of algorithms underwent data extraction, pre-processing and analysis. Specific patient information (diagnoses, diagnostic complexity, interventions, arrhythmia, medications, and demographic data) was processed into text files and structured data tables, used to populate a database. A specific data validation schema was pre-defined to verify and accommodate the information populating the database. Unsupervised learning in the form of a dimensionality reduction technique was used to project data into two dimensions and visualise their intrinsic structure in relation to the diagnosis, medication, intervention, and ESC classification lists of disease complexity. Nine-three randomly selected letters were manually reviewed for accuracy. Results: 1409 consecutive outpatient clinic letters were used to populate the Scottish Adult Congenital Cardiac Database. Mean patient age was 35.4yrs, 47.6% female with 698, 49.5% having moderately complex, 369, 26.1% greatly complex, and 284, 20.1%, mildly complexity lesions. Individual diagnoses were successfully extracted in 96.95%, and demographic data was extracted in 100% of letters. Data extraction, database upload, data analysis and visualisation took 571 seconds (9.51 minutes). Manual data extraction in the categories of diagnoses, intervention and medications yielded accuracy of the computer algorithm in 94%, 93%, and 93% respectively. Conclusions: Semi-automated data extraction from clinic letters into a database can be successfully achieved with a high degree of accuracy and efficiency.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:ALKAN, MUHAMMET and Walker, Dr Niki and Anagnostopoulos, Dr Christos and Veldtman, Professor Gruschen and Deligianni, Dr Fani and Danton, Mr Mark and Swan, Dr Lorna
Authors: Verma, S., Alkan, M., Deligianni, F., Anagnostopoulos, C., Diller, G., Walker, L., Johnston, F. C., Danton, M., Walker, H., Swan, L., Hunter, A., McGuire, A., Dawes, M., Stott, S., Lyndsey, M., Walker, N., and Veldtman, G.
College/School:College of Medical Veterinary and Life Sciences > School of Cardiovascular & Metabolic Health
College of Science and Engineering > School of Computing Science
Journal Name:Canadian Journal of Cardiology
Publisher:Elsevier
ISSN:0828-282X
ISSN (Online):1916-7075
Published Online:31 May 2022
Copyright Holders:Copyright © 2022 Elsevier
First Published:First published in Canadian Journal of Cardiology 38(10): 1634-1640
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record