Understanding affective content of music videos through learned representations

Acar, E., Hopfgartner, F. and Albayrak, S. (2014) Understanding affective content of music videos through learned representations. In: MMM'14: 20th Anniversary International Conference on Multimedia Modeling, Dublin, Ireland, 6-10 Jan 2014, pp. 303-314. ISBN 9783319041131 (doi:10.1007/978-3-319-04114-8_26)

[img]
Preview
Text
101297.pdf - Accepted Version

1MB

Abstract

In consideration of the ever-growing available multimedia data, annotating multimedia content automatically with feeling(s) expected to arise in users is a challenging problem. In order to solve this problem, the emerging research field of video affective analysis aims at exploiting human emotions. In this field where no dominant feature representation has emerged yet, choosing discriminative features for the effective representation of video segments is a key issue in designing video affective content analysis algorithms. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations based on these low-level features. In this work, we propose to use deep learning methods, in particular convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted low-level features. We exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the RGB space in order to build higher level audio and visual representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results on a subset of the DEAP dataset (on 76 music video clips) show that a significant improvement is obtained when higher level representations are used instead of low-level features, for video affective content analysis.

Item Type:Conference Proceedings
Additional Information:Published in volume 8325 of the series Lecture Notes in Computer Science pp 303-314.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Hopfgartner, Dr Frank
Authors: Acar, E., Hopfgartner, F., and Albayrak, S.
College/School:College of Arts > School of Humanities > Humanities Advanced Technology and Information Institute (HATII)
Publisher:Springer Verlag
ISBN:9783319041131
Publisher Policy:Reproduced in accordance with the publisher policy

University Staff: Request a correction | Enlighten Editors: Update this record