Tri-modal speech recognition for noisy and variable lighting conditions

Anderson, S., Fong, A.C.M. and Tang, J. (2013) Tri-modal speech recognition for noisy and variable lighting conditions. In: 31st IEEE International Conference on Consumer Electronics (ICCE2013), Las Vegas NV, USA, 11-14 Jan 2013, pp. 72-73. (doi: 10.1109/ICCE.2013.6486800)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1109/ICCE.2013.6486800

Abstract

Automatic speech recognition (ASR) has found widespread applications in consumer products. Often, ASR performance can be compromised in noisy environments. Previous research has shown that adding visual cues can improve the performance of ASR, particularly in noisy environments. However, audiovisual (AV) ASR is not robust against changing lighting conditions, which are often encountered by end users of consumer products. Since thermal imaging is highly invariant to changing lighting conditions, we propose a tri-modal ASR involving thermal imaging and audiovisual (TAV) data for consumer applications. Experimental results demonstrate the applicability of this approach over a range of signal-to-noise ratios: Tri-modal TAV recognition rates were +39.2% over audio-only and +11.8% over AV recognition rates.

Item Type:Conference Proceedings
Additional Information:Print ISBN: 9781467313612
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Fong, Dr Alvis Cheuk Min
Authors: Anderson, S., Fong, A.C.M., and Tang, J.
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record