Rank join queries in NoSQL databases

Ntarmos, N. , Patlakas, I. and Triantafillou, P. (2014) Rank join queries in NoSQL databases. Proceedings of the VLDB Endowment, 7(7), pp. 493-504.

89359.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.


Publisher's URL: http://www.vldb.org/pvldb/vol7.html


Rank (i.e., top-k) join queries play a key role in modern analytics tasks. However, despite their importance and unlike centralized settings, they have been completely overlooked in cloud NoSQL settings. We attempt to fill this gap: We contribute a suite of solutions and study their performance comprehensively. Baseline solutions are ordered using SQLlike languages (like Hive and Pig), based on MapReduce jobs. We first provide solutions that are based on specialized indices, which may themselves be accessed using either MapReduce or coordinator-based strategies. The first index-based solution is based on inverted indices, which are accessed with MapReduce jobs. The second index-based solution adapts a popular centralized rank-join algorithm. We further contribute a novel statistical structure comprising histograms and Bloom filters, which forms the basis for the third index-based solution. We provide (i) MapReduce algorithms showing how to build these indices and statistical structures, (ii) algorithms to allow for online updates to these indices, and (iii) query processing algorithms utilizing them. We implemented all algorithms in Hadoop (HDFS) and HBase and tested them on TPC-H datasets of various scales, utilizing different queries on tables of various sizes and different score-attribute distributions. We ported our implementations to Amazon EC2 and "in-house" lab clusters of various scales. We provide performance results for three metrics: query execution time, network bandwidth consumption, and dollar-cost for query execution.

Item Type:Articles
Glasgow Author(s) Enlighten ID:Triantafillou, Professor Peter and Ntarmos, Dr Nikos
Authors: Ntarmos, N., Patlakas, I., and Triantafillou, P.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Proceedings of the VLDB Endowment
Journal Abbr.:PVLDB
Publisher:VLDB Endowment Inc.
ISSN (Online):1047-7349
Copyright Holders:Copyright © 2014 VLDB Endowment
First Published:First published in Proceedings of the VLDB Endowment 7(7):493-504
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record