SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows

Koch, J., Thamsen, L., Schmidt, F. and Kao, O. (2018) SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows. In: 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Taipei, Taiwan, 18-20 Dec 2017, pp. 156-163. ISBN 9781538631515 (doi: 10.1109/PDCAT.2017.00034)

[img] Text
268139.pdf - Accepted Version
Restricted to Repository staff only

473kB

Abstract

Distributed dataflow systems such as Apache Spark allow the execution of iterative programs at large scale on clusters. In production use, programs are often recurring and have strict latency requirements. Yet, choosing appropriate resource allocations is difficult as runtimes are dependent on hard-to-predict factors, including failures, cluster utilization and dataset characteristics. Offline runtime prediction helps to estimate resource requirements, but cannot take into account inherent variance due to, for example, changing cluster states. We present SMiPE, a system estimating the progress of iterative dataflows by matching a running job to previous executions based on similarity, capturing properties such as convergence, hardware utilization and runtime. SMiPE is not limited to a specific framework due to its black-box approach and is able to adapt to changing cluster states reflected in the current job's statistics. SMiPE automatically adapts its similarity matching to algorithm-specific profiles by training parameters on the job history. We evaluated SMiPE with three iterative Spark jobs and nine datasets. The results show that SMiPE is effective in choosing useful historic runs and predicts runtimes with a mean relative error of 9.1% to 13.1%.

Item Type:Conference Proceedings
Additional Information:Funding: This work has been supported through grants by the German Science Foundation (DFG) as FOR 1306 Stratosphere and by the German Ministry for Education and Research (BMBF) as Berlin Big Data Center BBDC (funding mark 01IS14013A).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Thamsen, Dr Lauritz
Authors: Koch, J., Thamsen, L., Schmidt, F., and Kao, O.
College/School:College of Science and Engineering > School of Computing Science
Publisher:IEEE
ISBN:9781538631515

University Staff: Request a correction | Enlighten Editors: Update this record