Comparison of Balancing Techniques for Multimedia IR over Imbalanced Datasets

Bermejo, P., Hopfgartner, F. , Gamez, J., Callejon, J. and Jose, J. (2009) Comparison of Balancing Techniques for Multimedia IR over Imbalanced Datasets. In: 24th International Symposium on Computer and Information Sciences, 2009. ISCIS 2009, Guzelyurt, Turkey, 14-16 Sep 2009, pp. 674-679. ISBN 9781424450213 (doi:10.1109/ISCIS.2009.5291904)

39579.pdf - Accepted Version



A promising method to improve the performance of information retrieval systems is to approach retrieval tasks as a supervised classification problem. Previous user interactions, e.g. gathered from a thorough log file analysis, can be used to train classifiers which aim to inference relevance of retrieved documents based on user interactions. A problem in this approach is, however, the large imbalance ratio between relevant and non-relevant documents in the collection. In standard test collection as used in academic evaluation frameworks such as TREC, non-relevant documents outnumber relevant documents by far. In this work, we address this imbalance problem in the multimedia domain. We focus on the logs of two multimedia user studies which are highly imbalanced. We compare a naiinodotve solution of randomly deleting documents belonging to the majority class with various balancing algorithms coming from different fields: data classification and text classification. Our experiments indicate that all algorithms improve the classification performance of just deleting at random from the dominant class.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Jose, Professor Joemon and Hopfgartner, Dr Frank
Authors: Bermejo, P., Hopfgartner, F., Gamez, J., Callejon, J., and Jose, J.
Subjects:Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Arts > School of Humanities > Information Studies
College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2009 IEEE
First Published:First published in Computer and Information Sciences, 2009:
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record