A Data Driven Approach to Audiovisual Speech Mapping

Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B. , Derleth, P. and Hussain, A. (2016) A Data Driven Approach to Audiovisual Speech Mapping. In: 8th International Conference on Brain Inspired Cognitive Systems (BICS 2016), Beijing, China, 28-30 Nov 2016, pp. 331-342. ISBN 9783319496856 (doi: 10.1007/978-3-319-49685-6_30)




The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Whitmer, Dr William
Authors: Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B., Derleth, P., and Hussain, A.
College/School:College of Medical Veterinary and Life Sciences > School of Health & Wellbeing > MRC/CSO SPHSU
Published Online:13 November 2016
Copyright Holders:Copyright © 2016 Springer International Publishing
First Published:First published in Lecture Notes in Computer Science 10023:331-342
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record