Multimodal Deep Learning Framework for Mental Disorder Recognition

Zhang, Z., Lin, W., Liu, M. and Mahmoud, M. (2020) Multimodal Deep Learning Framework for Mental Disorder Recognition. In: 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2020), Buenos Aires, Argentina, 16-20 Nov 2020, pp. 344-350. ISBN 9781728130798 (doi: 10.1109/FG47880.2020.00033)

Full text not currently available from Enlighten.


Current methods for mental disorder recognition mostly depend on clinical interviews and self-reported scores that can be highly subjective. Building an automatic recognition system can help in early detection of symptoms and providing insights into the biological markers for diagnosis. It is, however, a challenging task as it requires taking into account indicators from different modalities, such as facial expressions, gestures, acoustic features and verbal content. To address this issue, we propose a general-purpose multimodal deep learning framework, in which multiple modalities - including acoustic, visual and textual features - are processed individually with the cross-modality correlation considered. Specifically, a Multimodal Deep Denoising Autoencoder (multi- DDAE) is designed to obtain multimodal representations of audio-visual features followed by the Fisher Vector encoding which produces session-level descriptors. For textual modality, a Paragraph Vector (PV) is proposed to embed the transcripts of interview sessions into document representations capturing cues related to mental disorders. Following an early fusion strategy, both audio-visual and textual features are then fused prior to feeding them to a Multitask Deep Neural Network (DNN) as the final classifier. Our framework is evaluated on the automatic detection of two mental disorders: bipolar disorder (BD) and depression, using two datasets: Bipolar Disorder Corpus (BDC) and the Extended Distress Analysis Interview Corpus (E-DAIC), respectively. Our experimental evaluation results showed comparable performance to the state-of-the-art in BD and depression detection, thus demonstrating the effective multimodal representation learning and the capability to generalise across different mental disorders.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Mahmoud, Dr Marwa
Authors: Zhang, Z., Lin, W., Liu, M., and Mahmoud, M.
College/School:College of Science and Engineering > School of Computing Science
Published Online:18 January 2021

University Staff: Request a correction | Enlighten Editors: Update this record