Anagnostopoulos, C. and Triantafillou, P. (2020) Large-scale predictive modeling and analytics through regression queries in data management systems. International Journal of Data Science and Analytics, 9(1), pp. 17-55. (doi: 10.1007/s41060-018-0163-5)
|
Text
175039.pdf - Published Version Available under License Creative Commons Attribution. 4MB |
Abstract
Regression analytics has been the standard approach to modeling the relationship between input and output variables, while recent trends aim to incorporate advanced regression analytics capabilities within data management systems (DMS). Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute with a novel predictive analytics model and an associated statistical learning methodology, which are efficient, scalable and accurate in discovering piecewise linear dependencies among variables by observing only regression queries and their answers issued to a DMS. We focus on in-DMS piecewise linear regression and specifically in predicting the answers to mean-value aggregate queries, identifying and delivering the piecewise linear dependencies between variables to regression queries and predicting the data dependent variables within specific data subspaces defined by analysts and data scientists. Our goal is to discover a piecewise linear data function approximation over the underlying data only through query–answer pairs that is competitive with the best piecewise linear approximation to the ground truth. Our methodology is analyzed, evaluated and compared with exact solution and near-perfect approximations of the underlying relationships among variables achieving orders of magnitude improvement in analytics processing.
Item Type: | Articles |
---|---|
Additional Information: | This work is funded by the EU H2020 GNFUV Project RAWFIE-OC2-EXP-SCI (Grant#645220), under the EC FIRE+ initiative. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter |
Authors: | Anagnostopoulos, C., and Triantafillou, P. |
College/School: | College of Science and Engineering > School of Computing Science |
Journal Name: | International Journal of Data Science and Analytics |
Publisher: | Springer |
ISSN: | 2364-415X |
ISSN (Online): | 2364-4168 |
Published Online: | 27 December 2018 |
Copyright Holders: | Copyright © 2018 The Authors |
First Published: | First published in International Journal of Data Science and Analytics 9(1): 17-55 |
Publisher Policy: | Reproduced under a Creative Commons License |
University Staff: Request a correction | Enlighten Editors: Update this record