Tonellotto, N. and Macdonald, C. (2020) Using an inverted index synopsis for query latency and performance prediction. ACM Transactions on Information Systems, 38(3), 29. (doi: 10.1145/3389795)
|
Text
212648.pdf - Accepted Version 6MB |
Abstract
Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index.
Item Type: | Articles |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Macdonald, Professor Craig |
Authors: | Tonellotto, N., and Macdonald, C. |
College/School: | College of Science and Engineering > School of Computing Science |
Journal Name: | ACM Transactions on Information Systems |
Publisher: | ACM Press |
ISSN: | 1046-8188 |
ISSN (Online): | 1558-2868 |
Published Online: | 13 May 2020 |
Copyright Holders: | Copyright © 2020 Association for Computing Machinery |
First Published: | First published in ACM Transactions on Information Systems 38(3): 29 |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
University Staff: Request a correction | Enlighten Editors: Update this record