Revisiting Exact kNN Query Processing with Probabilistic Data Space Transformations

Cahsai, A., Anagnostopoulos, C. , Ntarmos, N. and Triantafillou, P. (2019) Revisiting Exact kNN Query Processing with Probabilistic Data Space Transformations. In: 2018 IEEE International Conference on Big Data, Seattle, WA, USA, 10-13 Dec 2018, pp. 653-662. ISBN 9781538650356 (doi: 10.1109/BigData.2018.8621943)

[img]
Preview
Text
172710.pdf - Accepted Version

1MB

Abstract

The state-of-the-art approaches for scalable kNN query processing utilise big data parallel/distributed platforms (e.g., Hadoop and Spark) and storage engines (e.g, HDFS, NoSQL, etc.), upon which they build (tree based) indexing methods for efficient query processing. However, as data sizes continue to increase (nowadays it is not uncommon to reach several Petabytes), the storage cost of tree-based index structures becomes exceptionally high. In this work, we propose a novel perspective to organise multivariate (mv) datasets. The main novel idea relies on data space probabilistic transformations and derives a Space Transformation Organisation Structure (STOS) for mv data organisation. STOS facilitates query processing as if underlying datasets were uniformly distributed. This approach bears significant advantages. First, STOS enjoys a minute memory footprint that is many orders of magnitude smaller than indexes in related work. Second, the required memory, unlike related work, increases very slowly with dataset size and, thus, enjoys significantly higher scalability. Third, the STOS structure is relatively efficient to compute, outperforming traditional index building times. The new approach comes bundled with a distributed coordinator-based query processing method so that, overall, lower query processing times are achieved compared to the state-of-the-art index-based methods. We conducted extensive experimentation with real and synthetic datasets of different sizes to substantiate and quantify the performance advantages of our proposal.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Cahsai, Mr Atoshum and Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter and Ntarmos, Dr Nikos
Authors: Cahsai, A., Anagnostopoulos, C., Ntarmos, N., and Triantafillou, P.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781538650356
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record