Using an inverted index synopsis for query latency and performance prediction

Tonellotto, N. and Macdonald, C. (2020) Using an inverted index synopsis for query latency and performance prediction. ACM Transactions on Information Systems, 38(3), 29. (doi: 10.1145/3389795)

[img]
Preview
Text
212648.pdf - Accepted Version

6MB

Abstract

Predicting the query latency by a search engine has important benefits, for instance, in allowing the search engine to adjust its configuration to address long-running queries without unnecessarily sacrificing its effectiveness. However, for the dynamic pruning techniques that underlie many commercial search engines, achieving accurate predictions of query latencies is difficult. We propose the use of index synopses—which are stochastic samples of the full index—for attaining accurate timing predictions. Indeed, we experiment using the TREC ClueWeb09 collection, and a large set of real user queries, and find that using small index synopses it is possible to very accurately estimate properties of the larger index, including sizes of posting list unions and intersections. Thereafter, we demonstrate that index synopses facilitate two key use cases: first, for query efficiency prediction, we show that predicting the query latencies on the full index and classifying long-running queries can be accurately achieved using index synopses; second, for query performance prediction, we show that the effectiveness of queries can be estimated more accurately using a synopsis index post-retrieval predictor than a pre-retrieval predictor. Overall, our experiments demonstrate the value of such a stochastic sample of a larger index at predicting the properties of the larger index.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig
Authors: Tonellotto, N., and Macdonald, C.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:ACM Transactions on Information Systems
Publisher:ACM Press
ISSN:1046-8188
ISSN (Online):1558-2868
Published Online:13 May 2020
Copyright Holders:Copyright © 2020 Association for Computing Machinery
First Published:First published in ACM Transactions on Information Systems 38(3): 29
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record