Performance and scalability of indexed subgraph query processing methods

Katsarou, F., Ntarmos, N. and Triantafillou, P. (2015) Performance and scalability of indexed subgraph query processing methods. Proceedings of the VLDB Endowment, 8(12), pp. 1566-1577.

[img]
Preview
Text
107199.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

428kB

Publisher's URL: http://www.vldb.org/pvldb/vol8.html

Abstract

Graph data management systems have become very popular as graphs are the natural data model for many applications. One of the main problems addressed by these systems is subgraph query processing; i.e., given a query graph, return all graphs that contain the query. The naive method for processing such queries is to perform a subgraph isomorphism test against each graph in the dataset. This obviously does not scale, as subgraph isomorphism is NP-Complete. Thus, many indexing methods have been proposed to reduce the number of candidate graphs that have to underpass the subgraph isomorphism test. In this paper, we identify a set of key factors-parameters, that influence the performance of related methods: namely, the number of nodes per graph, the graph density, the number of distinct labels, the number of graphs in the dataset, and the query graph size. We then conduct comprehensive and systematic experiments that analyze the sensitivity of the various methods on the values of the key parameters. Our aims are twofold: first to derive conclusions about the algorithms’ relative performance, and, second, to stress-test all algorithms, deriving insights as to their scalability, and highlight how both performance and scalability depend on the above factors. We choose six wellestablished indexing methods, namely Grapes, CT-Index, GraphGrepSX, gIndex, Tree+∆, and gCode, as representative approaches of the overall design space, including the most recent and best performing methods. We report on their index construction time and index size, and on query processing performance in terms of time and false positive ratio. We employ both real and synthetic datasets. Specifi- cally, four real datasets of different characteristics are used: AIDS, PDBS, PCM, and PPI. In addition, we generate a large number of synthetic graph datasets, empowering us to systematically study the algorithms’ performance and scalability versus the aforementioned key parameters.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Katsarou, Foteini and Triantafillou, Professor Peter and Ntarmos, Dr Nikos
Authors: Katsarou, F., Ntarmos, N., and Triantafillou, P.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Proceedings of the VLDB Endowment
Publisher:VLDB Endowment Inc.
ISSN:1047-7349
ISSN (Online):1047-7349
Copyright Holders:Copyright © 2015 VLDB Endowmen
First Published:First published in Proceedings of the VLDB Endowment 8(12):1566-1577
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record