Acar, E., Hopfgartner, F. and Albayrak, S. (2013) Detecting violent content in Hollywood movies by mid-level audio representations. In: 11th International Workshop on Content-Based Multimedia Indexing (CBMI), Veszprem, Hungary, 17-19 Jun 2013, pp. 73-78. (doi: 10.1109/CBMI.2013.6576556)
Full text not currently available from Enlighten.
Publisher's URL: http://dx.doi.org/10.1109/CBMI.2013.6576556
Abstract
Movie violent content detection e.g., for providing automated youth protection services is a valuable video content analysis functionality. Choosing discriminative features for the representation of video segments is a key issue in designing violence detection algorithms. In this paper, we employ mid-level audio features which are based on a Bag-of-Audio Words (BoAW) method using Mel-Frequency Cepstral Coefficients (MFCCs). BoAW representations are constructed with two different methods, namely the vector quantization-based (VQ-based) method and the sparse coding-based (SC-based) method. We choose two-class support vector machines (SVMs) for classifying video shots as (non-)violent. Our experiments on detecting violent video shots in Hollywood movies show that the mid-level audio features provide promising results. Additionally, we establish that the SC-based method outperforms the VQ-based one. More importantly, the SC-based method outperforms the unimodal submissions in the MediaEval Violent Scenes Detection (VSD) task, except one vision-based method in terms of average precision.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Hopfgartner, Dr Frank |
Authors: | Acar, E., Hopfgartner, F., and Albayrak, S. |
College/School: | College of Arts & Humanities > School of Humanities > Information Studies |
Publisher: | Springer Verlag |
ISSN: | 1949-3983 |
University Staff: Request a correction | Enlighten Editors: Update this record