Savva, F. , Anagnostopoulos, C. and Triantafillou, P. (2020) SuRF: Identification of Interesting Data Regions with Surrogate Models. In: 36th IEEE International Conference on Data Engineering (IEEE ICDE), Dallas, TX, USA, 20-24 April 2020, pp. 1321-1332. ISBN 9781728129037 (doi: 10.1109/ICDE48307.2020.00118)
|
Text
209812.pdf - Accepted Version 1MB |
Abstract
Several data mining tasks focus on repeatedly inspecting multidimensional data regions summarized by a statistic. The value of this statistic (e.g., region-population sizes, order moments) is used to classify the region’s interesting-ness. These regions can be naively extracted from the entire dataspace – however, this is extremely time-consuming and compute-resource demanding. This paper studies the reverse problem: analysts provide a cut-off value for a statistic of interest and in turn our proposed framework efficiently identifies multidimensional regions whose statistic exceeds (or is below) the given cut-off value (according to user’s needs). However, as data dimensions and size increase, such task inevitably becomes laborious and costly. To alleviate this cost, our solution, coined SuRF (SUrrogate Region Finder), leverages historical region evaluations to train surrogate models that learn to approximate the distribution of the statistic of interest. It then makes use of evolutionary multi-modal optimization to effectively and efficiently identify regions of interest regardless of data size and dimensionality. The accuracy, efficiency, and scalability of our approach are demonstrated with experiments using synthetic and real-world datasets and compared with other methods.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Savva, Mr Fotis and Anagnostopoulos, Dr Christos and Triantafillou, Professor Peter |
Authors: | Savva, F., Anagnostopoulos, C., and Triantafillou, P. |
College/School: | College of Science and Engineering > School of Computing Science |
ISSN: | 2375-026X |
ISBN: | 9781728129037 |
Copyright Holders: | Copyright © 2020 IEEE |
First Published: | First published in 2020 IEEE 36th International Conference on Data Engineering (ICDE) |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
University Staff: Request a correction | Enlighten Editors: Update this record