A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material

Acar, E., Hopfgartner, F. and Albayrak, S. (2017) A comprehensive study on mid-level representation and ensemble learning for emotional analysis of video material. Multimedia Tools and Applications, 76(9), pp. 11809-11837. (doi:10.1007/s11042-016-3618-5)

119609.pdf - Accepted Version



In today’s society where audio-visual content such as professionally edited and user-generated videos is ubiquitous, automatic analysis of this content is a decisive functionality. Within this context, there is an extensive ongoing research about understanding the semantics (i.e., facts) such as objects or events in videos. However, little research has been devoted to understanding the emotional content of the videos. In this paper, we address this issue and introduce a system that performs emotional content analysis of professionally edited and user-generated videos. We concentrate both on the representation and modeling aspects. Videos are represented using midlevel audio-visual features. More specifically, audio and static visual representations are automatically learned from raw data using convolutional neural networks (CNNs). In addition, dense trajectory based motion and SentiBank domain-specific features are incorporated. By means of ensemble learning and fusion mechanisms, videos are classified into one of predefined emotion categories. Results obtained on the VideoEmotion dataset and a subset of the DEAP dataset show that (1) higher level representations perform better than low-level features, (2) among audio features, mid-level learned representations perform better than mid-level handcrafted ones, (3) incorporating motion and domain-specific information leads to a notable performance gain, and (4) ensemble learning is superior to multi-class support vector machines (SVMs) for video affective content analysis.

Item Type:Articles
Additional Information:The research leading to these results has received funding from the European Community FP7 under grant agreement number 261743 (NoE VideoSense).
Glasgow Author(s) Enlighten ID:Hopfgartner, Dr Frank
Authors: Acar, E., Hopfgartner, F., and Albayrak, S.
Subjects:Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
College/School:College of Arts > School of Humanities > Information Studies
Journal Name:Multimedia Tools and Applications
Publisher:Springer US
ISSN (Online):1573-7721
Published Online:10 June 2016
Copyright Holders:Copyright © 2016 Springer US
First Published:First published in Multimedia Tools and Applications 76(9):11809–11837
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record