On inverted index compression for search engine efficiency

Catena, M., Macdonald, C. and Ounis, I. (2014) On inverted index compression for search engine efficiency. Lecture Notes in Computer Science, 8416, pp. 359-371. (doi:10.1007/978-3-319-06028-6_30)

[img]
Preview
Text
93572.pdf - Accepted Version

204kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-319-06028-6_30

Abstract

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries . While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we experiment with different modern integer compression algorithms, integrating these into a modern IR system. Through comprehensive experiments conducted on two large, widely used document corpora and large query sets, our results show the benefit of compression for different types of posting information to the space- and time-efficiency of the search engine. Overall, we find that the simple Frame of Reference compression scheme results in the best query response times for all types of posting information. Moreover, we observe that the frequency and position posting information in Web corpora that have large volumes of anchor text are more challenging to compress, yet compression is beneficial in reducing average query response times.

Item Type:Articles
Additional Information:Proceedings of 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, 13-16 April, 2014. ISBN: 9783319060279
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh
Authors: Catena, M., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Lecture Notes in Computer Science
Publisher:Springer Verlag
ISSN:0302-9743
ISSN (Online):1611-3349
Copyright Holders:Copyright © 2014 Springer
First Published:First published in Lecture Notes in Computer Science 8416:359-371
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record