Enlighten Publications

In this section

MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

Li, J., Dai, H., Han, H. and Ding, Y. (2023) MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving. In: IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR 2023), Vancouver, Canada, 18-22 June 2023, pp. 21694-21704. ISBN 9798350301298 (doi: 10.1109/CVPR52729.2023.02078)

Text
296472.pdf - Accepted Version
1MB

Abstract

LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDAR-only baseline. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.

Item Type:	Conference Proceedings
Additional Information:	This work was supported by the National Key Research and Development Program of China (2018YFE0183900) and YUNJI Technology Co. Ltd.
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Dai, Dr Hang
Authors:	Li, J., Dai, H., Han, H., and Ding, Y.
College/School:	College of Science and Engineering > School of Computing Science
ISSN:	2575-7075
ISBN:	9798350301298
Copyright Holders:	Copyright © 2023, IEEE
First Published:	First published in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Publisher Policy:	Reproduced in accordance with the publisher copyright policy
Related URLs:	Organisation

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	296472
Depositing User:	Miss Leigh Bunton
Datestamp:	14 Apr 2023 10:40
Last Modified:	13 Jan 2024 02:31
Date of acceptance:	27 February 2023
Date of first online publication:	22 August 2023
Date Deposited:	14 April 2023
Data Availability Statement:	No