Mccreadie, R. , Macdonald, C. and Ounis, I. (2009) On single-pass indexing with MapReduce. Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval, pp. 742-743.
Full text not currently available from Enlighten.
Publisher's URL: http://www.sigir2009.org/
Abstract
Indexing is an important Information Retrieval (IR) operation, which must be parallelised to support large-scale document corpora. We propose a novel adaptation of the state-of-the-art single-pass indexing algorithm in terms of the MapReduce programming model. We then experiment with this adaptation, in the context of the Hadoop MapReduce implementation. In particular, we explore the scale of improvements that can be achieved when using firstly more processing hardware and secondly larger corpora. Our results show that indexing speed increases in a close to linear fashion when scaling corpus size or number of processing machines. This suggests that the proposed indexing implementation is viable to support upcoming large-scale corpora.
Item Type: | Articles |
---|---|
Additional Information: | isbn: 9781605584836 |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Mccreadie, Dr Richard and Macdonald, Professor Craig and Ounis, Professor Iadh |
Authors: | Mccreadie, R., Macdonald, C., and Ounis, I. |
Subjects: | Q Science > QA Mathematics > QA76 Computer software |
College/School: | College of Science and Engineering > School of Computing Science |
Journal Name: | Proceedings of the Annual International ACMSIGIR Conference on Research and Development in Information Retrieval |
Publisher: | ACM |
University Staff: Request a correction | Enlighten Editors: Update this record