Large-scale predictive modeling and analytics through regression queries in data management systems

Anagnostopoulos, C. and Triantafillou, P. (2020) Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics, 9(1), pp. 17-55. (doi: 10.1007/s41060-018-0163-5)

175039.pdf - Published Version
Available under License Creative Commons Attribution.



Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query–answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing.

Item Type:Articles
Additional Information:This work is funded by the EU H2020 GNFUV Project RAWFIE-OC2-EXP-SCI (Grant#645220), under the EC FIRE+ initiative.
Glasgow Author(s) Enlighten ID:Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter
Authors: Anagnostopoulos, C., and Triantafillou, P.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:International Journal of Data Science and Analytics
ISSN (Online):2364-4168
Published Online:27 December 2018
Copyright Holders:Copyright © 2018 The Authors
First Published:First published in International Journal of Data Science and Analytics 9(1): 17-55
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science