On single-pass indexing with MapReduce

Mccreadie, R., Macdonald, C. and Ounis, I. (2009) On single-pass indexing with MapReduce. Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 742-743.

Full text not currently available from Enlighten.

Publisher's URL: http://www.sigir2009.org/

Abstract

Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.

Item Type:Articles
Additional Information:isbn: 9781605584836
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh and Mccreadie, Mr Richard
Authors: Mccreadie, R., Macdonald, C., and Ounis, I.
Subjects:Q Science > QA Mathematics > QA76 Computer software
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval
Publisher:ACM

University Staff: Request a correction | Enlighten Editors: Update this record