Predictive intelligence of reliable analytics in distributed computing environments

Anagnostopoulos, C. and Kolomvatsos, K. (2020) Predictive intelligence of reliable analytics in distributed computing environments. Applied Intelligence, 50, pp. 3219-3238. (doi: 10.1007/s10489-020-01712-5)

[img] Text
213238.pdf - Published Version
Available under License Creative Commons Attribution.

5MB

Abstract

Lack of knowledge in the underlying data distribution in distributed large-scale data can be an obstacle when issuing analytics & predictive modelling queries. Analysts find themselves having a hard time finding analytics/exploration queries that satisfy their needs. In this paper, we study how exploration query results can be predicted in order to avoid the execution of ‘bad’/non-informative queries that waste network, storage, financial resources, and time in a distributed computing environment. The proposed methodology involves clustering of a training set of exploration queries along with the cardinality of the results (score) they retrieved and then using query-centroid representatives to proceed with predictions. After the training phase, we propose a novel refinement process to increase the reliability of predicting the score of new unseen queries based on the refined query representatives. Comprehensive experimentation with real datasets shows that more reliable predictions are acquired after the proposed refinement method, which increases the reliability of the closest centroid and improves predictability under the right circumstances.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Kolomvatsos, Dr Kostas and Anagnostopoulos, Dr Christos
Authors: Anagnostopoulos, C., and Kolomvatsos, K.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Applied Intelligence
Publisher:Springer
ISSN:0924-669X
ISSN (Online):1573-7497
Published Online:14 May 2020
Copyright Holders:Copyright © The Author(s) 2020
First Published:First published in Applied Intelligence 50:3219–3238
Publisher Policy:Reproduced under a Creative Commons license
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
301654Intelligent Applications over Large Scale Data StreamsChristos AnagnostopoulosEuropean Commission (EC)745829Computing Science
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science