Improving Image Representations via MoCo Pre-Training for Multimodal CXR Classification

Dalla Serra, F., Jacenkow, G., Deligianni, F. , Dalton, J. and O’Neil, A. Q. (2022) Improving Image Representations via MoCo Pre-Training for Multimodal CXR Classification. In: 26th UK Conference on Medical Image Understanding and Analysis (MIUA 2022), University of Cambridge, 27-29 July 2022, pp. 623-635. ISBN 9783031120527 (doi: 10.1007/978-3-031-12053-4_46)

[img] Text
273110.pdf - Accepted Version

2MB

Abstract

Multimodal learning, here defined as learning from multiple input data types, has exciting potential for healthcare. However, current techniques rely on large multimodal datasets being available, which is rarely the case in the medical domain. In this work, we focus on improving the extracted image features which are fed into multimodal image-text Transformer architectures, evaluating on a medical multimodal classification task with dual inputs of chest X-ray images (CXRs) and the indication text passages in the corresponding radiology reports. We demonstrate that self-supervised Momentum Contrast (MoCo) pre-training of the image representation model on a large set of unlabelled CXR images improves multimodal performance compared to supervised ImageNet pre-training. MoCo shows a 0.6% absolute improvement in AUROC-macro, when considering the full MIMIC-CXR training set, and 5.1% improvement when limiting to 10% of the training data. To the best of our knowledge, this is the first demonstration of MoCo image pre-training for multimodal learning in medical imaging.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Deligianni, Dr Fani and Dalton, Dr Jeff and Dalla Serra, Francesco
Authors: Dalla Serra, F., Jacenkow, G., Deligianni, F., Dalton, J., and O’Neil, A. Q.
College/School:College of Science and Engineering > School of Computing Science
ISSN:0302-9743
ISBN:9783031120527
Published Online:25 July 2022
Copyright Holders:Copyright © 2022 The Authors
First Published:First published in Lecture Notes in Computer Science 13413: 623-635
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record