Robust tri-modal automatic speech recognition for consumer applications

Anderson, S., Fong, A.C.M. and Tang, J. (2013) Robust tri-modal automatic speech recognition for consumer applications. IEEE Transactions on Consumer Electronics, 59(2), pp. 352-360. (doi: 10.1109/TCE.2013.6531117)

Full text not currently available from Enlighten.


Commercial automatic speech recognition (ASR) started to appear in the late 1980¿s and can offer a more natural means of accepting user inputs than methods such as typing on keyboards or touch screens. This is a particularly important consideration for small consumer devices such as smartphones. In many practical situations, however, performance of ASR can be significantly compromised due to ambient noise and variable lighting conditions. Previous research has shown that adding visual cues to standard ASR can mitigate the effects of ambient noise. However, audiovisual (AV) ASR is not robust against variable lighting conditions, which are often encountered by users of consumer devices. Since thermal imaging is invariant to changing lighting conditions, the authors propose a trimodal thermal-audiovisual (TAV) ASR using adaptations of established techniques such as MT, DCT and MFCC. Experimental results demonstrate the robustness of this approach over a range of signal-to-noise ratios: tri-modal TAV recognition rates were +39.2% over audio-only ASR and +11.8% over AVASR recognition rates The authors believe that robust ASR will lead to improved user experiences.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Fong, Dr Alvis Cheuk Min
Authors: Anderson, S., Fong, A.C.M., and Tang, J.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Journal Name:IEEE Transactions on Consumer Electronics
ISSN (Online):1558-4127

University Staff: Request a correction | Enlighten Editors: Update this record