Violence detection in hollywood movies by the fusion of visual and mid-level audio cues

Acar, E., Hopfgartner, F. and Albayrak, S. (2013) Violence detection in hollywood movies by the fusion of visual and mid-level audio cues. In: 21st ACM international conference on Multimedia MM '13, Barcelona, Spain, 21-25 Oct 2013, pp. 717-720. ISBN 9781450324045 (doi:10.1145/2502081.2502187)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1145/2502081.2502187

Abstract

Detecting violent scenes in movies is an important video content understanding functionality e.g., for providing automated youth protection services. One key issue in designing algorithms for violence detection is the choice of discriminative features. In this paper, we employ mid-level audio features and compare their discriminative power against low-level audio and visual features. We fuse these mid-level audio cues with low-level visual ones at the decision level in order to further improve the performance of violence detection. We use Mel-Frequency Cepstral Coefficients (MFCC) as audio and average motion as visual features. In order to learn a violence model, we choose two-class support vector machines (SVMs). Our experimental results on detecting violent video shots in Hollywood movies show that mid-level audio features are more discriminative and provide more precise results than low-level ones. The detection performance is further enhanced by fusing the mid-level audio cues with low-level visual ones using an SVM-based decision fusion.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Hopfgartner, Dr Frank
Authors: Acar, E., Hopfgartner, F., and Albayrak, S.
College/School:College of Arts > School of Humanities > Humanities Advanced Technology and Information Institute (HATII)
Publisher:ACM
ISBN:9781450324045

University Staff: Request a correction | Enlighten Editors: Update this record