Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B. , Derleth, P. and Hussain, A. (2016) A Data Driven Approach to Audiovisual Speech Mapping. In: 8th International Conference on Brain Inspired Cognitive Systems (BICS 2016), Beijing, China, 28-30 Nov 2016, pp. 331-342. ISBN 9783319496856 (doi: 10.1007/978-3-319-49685-6_30)
|
Text
151711.pdf 379kB |
Abstract
The concept of using visual information as part of audio speech processing has been of significant recent interest. This paper presents a data driven approach that considers estimating audio speech acoustics using only temporal visual information without considering linguistic features such as phonemes and visemes. Audio (log filterbank) and visual (2D-DCT) features are extracted, and various configurations of MLP and datasets are used to identify optimal results, showing that given a sequence of prior visual frames an equivalent reasonably accurate audio frame estimation can be mapped.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Whitmer, Dr William |
Authors: | Abel, A., Marxer, R., Barker, J., Watt, R., Whitmer, B., Derleth, P., and Hussain, A. |
College/School: | College of Medical Veterinary and Life Sciences > School of Health & Wellbeing > MRC/CSO SPHSU |
ISSN: | 0302-9743 |
ISBN: | 9783319496856 |
Published Online: | 13 November 2016 |
Copyright Holders: | Copyright © 2016 Springer International Publishing |
First Published: | First published in Lecture Notes in Computer Science 10023:331-342 |
Publisher Policy: | Reproduced in accordance with the copyright policy of the publisher |
University Staff: Request a correction | Enlighten Editors: Update this record