Efficient query processing for scalable web search

Tonellotto, N., Macdonald, C. and Ounis, I. (2018) Efficient query processing for scalable web search. Foundations and Trends in Information Retrieval, 12(4-5), pp. 319-500. (doi: 10.1561/1500000057)

174036.pdf - Accepted Version



Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures.

Item Type:Articles
Additional Information:Nicola Tonellotto acknowledges the partial support by the BIGDATAGRAPES (grant agreement No. 780751) project, which has received funding from the European Union’s Horizon 2020 research and innovation framework, within the Information and Communication Technologies work programme.
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Ounis, Professor Iadh
Authors: Tonellotto, N., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Foundations and Trends in Information Retrieval
Publisher:Now Publishers
ISSN (Online):1554-0677
Copyright Holders:Copyright © 2018 The Authors
First Published:First published in Foundations and Trends in Information Retrieval 12(4-5):319-500
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record