SuRF: Identification of Interesting Data Regions with Surrogate Models

Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2020) SuRF: Identification of Interesting Data Regions with Surrogate Models. In: 36th IEEE International Conference on Data Engineering (IEEE ICDE), Dallas, TX, USA, 20-24 April 2020, pp. 1321-1332. ISBN 9781728129037 (doi:10.1109/ICDE48307.2020.00118)

[img]
Preview
Text
209812.pdf - Accepted Version

1MB

Abstract

Several data mining tasks focus on repeatedly inspecting multidimensional data regions summarized by a statistic. The value of this statistic (e.g., region-population sizes, order moments) is used to classify the region’s interesting-ness. These regions can be naively extracted from the entire dataspace – however, this is extremely time-consuming and compute-resource demanding. This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user’s needs). However, as data dimensions and size increase, such task inevitably becomes laborious and costly. To alleviate this cost, our solution, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest. It then makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality. The accuracy, efficiency, and scalability of our approach are demonstrated with experiments using synthetic and real-world datasets and compared with other methods.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter and Savva, Mr Fotis
Authors: Savva, F., Anagnostopoulos, C., and Triantafillou, P.
College/School:College of Science and Engineering > School of Computing Science
ISSN:2375-026X
ISBN:9781728129037
Copyright Holders:Copyright © 2020 IEEE
First Published:First published in 2020 IEEE 36th International Conference on Data Engineering (ICDE)
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
301654Intelligent Applications over Large Scale Data StreamsChristos AnagnostopoulosEuropean Commission (EC)745829Computing Science
190906EPSRC 2015 DTPMary Beth KneafseyEngineering and Physical Sciences Research Council (EPSRC)EP/M508056/1Research and Innovation Services
172865EPSRC DTP 16/17 and 17/18Tania GalabovaEngineering and Physical Sciences Research Council (EPSRC)EP/N509668/1Research and Innovation Services
300982Exploiting Closed-Loop Aspects in Computationally and Data Intensive AnalyticsRoderick Murray-SmithEngineering and Physical Sciences Research Council (EPSRC)EP/R018634/1Computing Science