Large-scale data exploration using explanatory regression functions

Savva, F. , Anagnostopoulos, C. , Triantafillou, P. and Kolomvatsos, K. (2020) Large-scale data exploration using explanatory regression functions. ACM Transactions on Knowledge Discovery from Data, 14(6), 76. (doi: 10.1145/3410448)

[img] Text
220291.pdf - Accepted Version

3MB

Abstract

Analysts wishing to explore multivariate data spaces, typically issue queries involving selection operators, i.e., range or equality predicates, which define data subspaces of potential interest. Then, they use aggregation functions, the results of which determine a subspace’s interestingness for further exploration and deeper analysis. However, Aggregate Query (AQ) results are scalars and convey limited information and explainability about the queried subspaces for enhanced exploratory analysis. Analysts have no way of identifying how these results are derived or how they change w.r.t query (input) parameter values. We address this shortcoming by aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism based on machine learning. We explain AQ results using functions obtained by a three-fold joint optimization problem which assume the form of explainable piecewise-linear regression functions. A key feature of the proposed solution is that the explanation functions are estimated using past executed queries. These queries provide a coarse grained overview of the underlying aggregate function (generating the AQ results) to be learned. Explanations for future, previously unseen AQs can be computed without accessing the underlying data and can be used to further explore the queried data subspaces, without issuing more queries to the backend analytics engine. We evaluate the explanation accuracy and efficiency through theoretically grounded metrics over real-world and synthetic datasets and query workloads.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Kolomvatsos, Dr Kostas and Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter and Savva, Mr Fotis
Authors: Savva, F., Anagnostopoulos, C., Triantafillou, P., and Kolomvatsos, K.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:ACM Transactions on Knowledge Discovery from Data
Publisher:Association for Computing Machinery
ISSN:1556-4681
ISSN (Online):1556-472X
Copyright Holders:Copyright © 2020 Association for Computing Machinery
First Published:First published in ACM Transactions on Knowledge Discovery from Data 14(6):76
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science
301654Intelligent Applications over Large Scale Data StreamsChristos AnagnostopoulosEuropean Commission (EC)745829Computing Science