A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Ge, Y. et al. (2023) A comprehensive multimodal dataset for contactless lip reading and acoustic analysis. Scientific Data, 10(1), 895. (doi: 10.1038/s41597-023-02793-w) (PMID:38092796) (PMCID:PMC10719268)

[img] Text
309266.pdf - Published Version
Available under License Creative Commons Attribution.

3MB

Abstract

Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.

Item Type:Articles
Additional Information:This work was supported in parts by Engineering and Physical Sciences Research Council (EPSRC) grants EP/T021020/1, EP/T021063/1 and EP/W003228/1. and by the RSE SAPHIRE grant.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Tang, Mr Chong and Wang, Mr Jingyan and Ge, Mr Yao and Chen, Zikang and Abbasi, Professor Qammer and Li, Mr Haobo and Imran, Professor Muhammad and Faccio, Professor Daniele and Cooper, Professor Jonathan and Li, Dr Wenda
Authors: Ge, Y., Tang, C., Li, H., Chen, Z., Wang, J., Li, W., Cooper, J., Chetty, K., Faccio, D., Imran, M., and Abbasi, Q. H.
College/School:College of Science and Engineering > School of Engineering > Biomedical Engineering
College of Science and Engineering > School of Engineering > Electronics and Nanoscale Engineering
College of Science and Engineering > School of Physics and Astronomy
Journal Name:Scientific Data
Publisher:Nature Research
ISSN:2052-4463
ISSN (Online):2052-4463
Copyright Holders:Copyright © The Author(s) 2023
First Published:First published in Scientific Data 10(1):895
Publisher Policy:Reproduced under a Creative Commons license

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
307829Quantum-Inspired Imaging for Remote Monitoring of Health & Disease in Community HealthcareJonathan CooperEngineering and Physical Sciences Research Council (EPSRC)EP/T021020/1ENG - Biomedical Engineering