Emotion Recognition from Speech to Improve Human-Robot Interaction

Zhu, C. and Ahmad, W. (2019) Emotion Recognition from Speech to Improve Human-Robot Interaction. In: 4th Cyber Science and Technology Congress (CyberSciTech 2019), Fukuoka, Japan, 5-8 August 2019, pp. 370-375. ISBN 9781728130248 (doi: 10.1109/DASC/PiCom/CBDCom/CyberSciTech.2019.00076)

Full text not currently available from Enlighten.


Speech emotion recognition (SER) has become one of the significant approaches to improve human-robot interaction. In this paper, two methods are proposed which take into consideration the size of the databases along with other aspects of the models. The first model applied K nearest neighbors (KNN) algorithms with 1-30 Gammatone frequency cepstral coefficients (GTCCs) which is mainly proposed for relatively small databases. It achieved 95.3% overall recognition accuracy on Berlin Emotional Speech database (EMODB). The second model is mainly focused on relatively large databases, which adopted 1-30 GTCCs, delta 1-30 GTCCs, delta-delta 1- 30 GTCCs, spectral features and prosodic features as the feature set and used long short-term memory (LSTM) as the classifier. An overall accuracy of 87.5% is achieved with this model when applied to Chinese emotional speech database (CASIA).

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Ahmad, Dr Wasim
Authors: Zhu, C., and Ahmad, W.
College/School:College of Science and Engineering > School of Engineering > Systems Power and Energy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record