Enlighten Publications

In this section

Understanding affective content of music videos through learned representations

Acar, E., Hopfgartner, F. and Albayrak, S. (2014) Understanding affective content of music videos through learned representations. In: MMM'14: 20th Anniversary International Conference on Multimedia Modeling, Dublin, Ireland, 6-10 Jan 2014, pp. 303-314. ISBN 9783319041131 (doi: 10.1007/978-3-319-04114-8_26)

Preview

Text
101297.pdf - Accepted Version
1MB

Abstract

In consideration of the ever-growing available multimedia data, annotating multimedia content automatically with feeling(s) expected to arise in users is a challenging problem. In order to solve this problem, the emerging research field of video affective analysis aims at exploiting human emotions. In this field where no dominant feature representation has emerged yet, choosing discriminative features for the effective representation of video segments is a key issue in designing video affective content analysis algorithms. Most existing affective content analysis methods either use low-level audio-visual features or generate hand-crafted higher level representations based on these low-level features. In this work, we propose to use deep learning methods, in particular convolutional neural networks (CNNs), in order to learn mid-level representations from automatically extracted low-level features. We exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the RGB space in order to build higher level audio and visual representations. We use the learned representations for the affective classification of music video clips. We choose multi-class support vector machines (SVMs) for classifying video clips into four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results on a subset of the DEAP dataset (on 76 music video clips) show that a significant improvement is obtained when higher level representations are used instead of low-level features, for video affective content analysis.

Item Type:	Conference Proceedings
Additional Information:	Published in volume 8325 of the series Lecture Notes in Computer Science pp 303-314.
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Hopfgartner, Dr Frank
Authors:	Acar, E., Hopfgartner, F., and Albayrak, S.
College/School:	College of Arts & Humanities > School of Humanities > Information Studies
Publisher:	Springer Verlag
ISBN:	9783319041131
Publisher Policy:	Reproduced in accordance with the publisher policy

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	101297
Depositing User:	Dr Frank Hopfgartner
Datestamp:	23 Jan 2015 15:58
Last Modified:	24 Sep 2021 02:25
Date of first online publication:	January 2014
Date Deposited:	16 March 2016