On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval

Rodriguez Perez, J. A. and Jose, J. M. (2015) On Microblog Dimensionality and Informativeness: Exploiting Microblogs' Structure and Dimensions for Ad-Hoc Retrieval. In: ICTIR 2015: 5th ACM SIGIR International Conference on the Theory of Information Retrieval, Northampton, MA, USA, 27-30 Sep 2015, pp. 211-220. ISBN 9781450338332 (doi: 10.1145/2808194.2809466)

Full text not currently available from Enlighten.

Abstract

In recent years, microblog services such as Twitter have gained increasing popularity, leading to active research on how to effectively exploit its content. Microblog documents such as tweets differ in morphology with respect to more traditional documents such as web pages. Particularly, tweets are considerably shorter (140 characters) than web documents and contain contextual tags regarding the topic (hashtags), intended audience (mentions) of the document as well as links to external content(URLs). Traditional and state of the art retrieval models perform rather poorly in capturing the relevance of tweets, since they have been designed under very different conditions. In this work, we define a microblog document as a high-dimensional entity and study the structural differences between those documents deemed relevant and those non-relevant. Secondly we experiment with enhancing the behaviour of the best observed performing retrieval model by means of a re-ranking approach that accounts for the relative differences in these dimensions amongst tweets. Additionally we study the interactions between the different dimensions in terms of their order within the documents by modelling relevant and non-relevant tweets as state machines. These state machines are then utilised to produce scores which in turn are used for re-ranking. Our evaluation results show statistically significant improvements over the baseline in terms of precision at different cut-off points for both approaches. These results confirm that the relative presence of the different dimensions within a document and their ordering are connected with the relevance of microblogs.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Jose, Professor Joemon and Rodriguez Perez, Dr Jesus
Authors: Rodriguez Perez, J. A., and Jose, J. M.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450338332
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record