Predicting co-verbal gestures: a deep and temporal modeling approach

Chiu, C.-C., Morency, L.-P. and Marsella, S. (2015) Predicting co-verbal gestures: a deep and temporal modeling approach. In: 15th International Intelligent Virtual Agents Conference (IVA 2015), Delft, The Netherlands, 26-28 Aug 2015, pp. 152-166. ISBN 9783319219950 (doi: 10.1007/978-3-319-21996-7_17)

Full text not currently available from Enlighten.


Gestures during spoken dialog play a central role in human communication. As a consequence, models of gesture generation are a key challenge in research on virtual humans, embodied agents capable of face-to-face interaction with people. Machine learning approaches to gesture generation must take into account the conceptual content in utterances, physical properties of speech signals and the physical properties of the gestures themselves. To address this challenge, we proposed a gestural sign scheme to facilitate supervised learning and presented the DCNF model, a model to jointly learn deep neural networks and second order linear chain temporal contingency. The approach we took realizes both the mapping relation between speech and gestures while taking account temporal relations among gestures. Our experiments on human co-verbal dataset shows significant improvement over previous work on gesture prediction. A generalization experiment performed on handwriting recognition also shows that DCNFs outperform the state-of-the-art approaches.

Item Type:Conference Proceedings
Additional Information:First published in Lecture Notes in Computer Science 9238: 152-166
Glasgow Author(s) Enlighten ID:Marsella, Professor Stacy
Authors: Chiu, C.-C., Morency, L.-P., and Marsella, S.
College/School:College of Medical Veterinary and Life Sciences > School of Psychology & Neuroscience
ISSN (Online):0302-9743

University Staff: Request a correction | Enlighten Editors: Update this record