Enlighten Publications

In this section

Optimizing Vision Transformers for Medical Image Segmentation

Liu, Q., Kaul, C., Wang, J., Anagnostopoulos, C. , Murray-Smith, R. and Deligianni, F. (2023) Optimizing Vision Transformers for Medical Image Segmentation. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023), Rhodes, Greece, 4-10 June 2023, ISBN 9781728163277 (doi: 10.1109/ICASSP49357.2023.10096379)

Text
292461.pdf - Accepted Version
Available under License Creative Commons Attribution.
1MB

Abstract

For medical image semantic segmentation (MISS), Vision Transformers have emerged as strong alternatives to convolutional neural networks thanks to their inherent ability to capture long-range correlations. However, existing research uses off-the-shelf vision Transformer blocks based on linear projections and feature processing which lack spatial and local context to refine organ boundaries. Furthermore, Transformers do not generalize well on small medical imaging datasets and rely on large-scale pre-training due to limited inductive biases. To address these problems, we demonstrate the design of a compact and accurate Transformer network for MISS, CS-Unet, which introduces convolutions in a multi-stage design for hierarchically enhancing spatial and local modeling ability of Transformers. This is mainly achieved by our well-designed Convolutional Swin Transformer (CST) block which merges convolutions with Multi-Head Self-Attention and Feed-Forward Networks for providing inherent localized spatial context and inductive biases. Experiments demonstrate CS-Unet without pre-training out- performs other counterparts by large margins on multi-organ and cardiac datasets with fewer parameters and achieves state-of-the-art performance. Our code is available at Github 1 .

Item Type:	Conference Proceedings
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Murray-Smith, Professor Roderick and Anagnostopoulos, Dr Christos and Liu, Qianying and Deligianni, Dr Fani and Kaul, Dr Chaitanya
Authors:	Liu, Q., Kaul, C., Wang, J., Anagnostopoulos, C., Murray-Smith, R., and Deligianni, F.
College/School:	College of Science and Engineering > School of Computing Science
ISBN:	9781728163277
Published Online:	05 May 2023
Copyright Holders:	Copyright © 2023 IEEE
First Published:	First published in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023)
Publisher Policy:	Reproduced in accordance with the publisher copyright policy
Related URLs:	Organisation

University Staff: Request a correction | Enlighten Editors: Update this record

Funder and Project Information

Project Code	Project Name	Principal Investigator	Funder's Name	Funder Ref	Lead Dept
315206	Privacy-Preserved Human Motion Analysis for Healthcare Applications	Fani Deligianni	Engineering and Physical Sciences Research Council (EPSRC)	EP/W01212X/1	Computing Science
315626	Privacy-preserved Human Motion Analysis	Fani Deligianni	The Royal Society (ROYSOC)	RGS\R2\212199	Computing Science
304546	I-CAIRD: Industrial Centre for AI Research in Digital Diagnostics	Keith Muir	Innovate UK (INNOVATE)	104690	Stroke & Brain Imaging

Deposit and Record Details

ID Code:	292461
Depositing User:	Miss Leigh Bunton
Datestamp:	20 Feb 2023 09:40
Last Modified:	04 Oct 2023 16:13
Date of acceptance:	17 February 2023
Date of first online publication:	5 May 2023
Date Deposited:	20 February 2023
Data Availability Statement:	No