Supervised and unsupervised language modelling in Chest X-Ray radiological reports

Drozdov, I., Forbes, D., Szubert, B., Hall, M., Carlin, C. and Lowe, D. J. (2020) Supervised and unsupervised language modelling in Chest X-Ray radiological reports. PLoS ONE, 15(3), e0229963. (doi: 10.1371/journal.pone.0229963) (PMID:32155219) (PMCID:PMC7064166)

211895.pdf - Published Version
Available under License Creative Commons Attribution.



Chest radiography (CXR) is the most commonly used imaging modality and deep neural network (DNN) algorithms have shown promise in effective triage of normal and abnormal radiograms. Typically, DNNs require large quantities of expertly labelled training exemplars, which in clinical contexts is a major bottleneck to effective modelling, as both considerable clinical skill and time is required to produce high-quality ground truths. In this work we evaluate thirteen supervised classifiers using two large free-text corpora and demonstrate that bi-directional long short-term memory (BiLSTM) networks with attention mechanism effectively identify Normal, Abnormal, and Unclear CXR reports in internal (n = 965 manually-labelled reports, f1-score = 0.94) and external (n = 465 manually-labelled reports, f1-score = 0.90) testing sets using a relatively small number of expert-labelled training observations (n = 3,856 annotated reports). Furthermore, we introduce a general unsupervised approach that accurately distinguishes Normal and Abnormal CXR reports in a large unlabelled corpus. We anticipate that the results presented in this work can be used to automatically extract standardized clinical information from free-text CXR radiological reports, facilitating the training of clinical decision support systems for CXR triage.

Item Type:Articles
Additional Information:Funding: This work is supported by Bering Limited and the Industrial Center for AI Research in Digital diagnostics (iCAIRD) which is funded by the Data to Early Diagnosis and Precision Medicine strand of the government’s Industrial Strategy Challenge Fund, managed and delivered by Innovate UK on behalf of UK Research and Innovation (UKRI) [Project number 104690].
Glasgow Author(s) Enlighten ID:Carlin, Dr Christopher and Lowe, Dr David
Creator Roles:
Carlin, C.Formal analysis, Investigation, Methodology, Validation, Writing – original draft, Writing – review and editing
Lowe, D. J.Conceptualization, Funding acquisition, Investigation, Resources, Writing – original draft, Writing – review and editing
Authors: Drozdov, I., Forbes, D., Szubert, B., Hall, M., Carlin, C., and Lowe, D. J.
College/School:College of Medical Veterinary and Life Sciences > School of Medicine, Dentistry & Nursing
Journal Name:PLoS ONE
Publisher:Public Library of Science
ISSN (Online):1932-6203
Copyright Holders:Copyright © 2020 Drozdov et al.
First Published:First published in PLoS ONE 15(3):e0229963
Publisher Policy:Reproduced under a Creative Commons license

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
304546I-CAIRD: Industrial Centre for AI Research in Digital DiagnosticsKeith MuirInnovate UK (INNOVATE)104690NP - Stroke & Brain Imaging