Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs

Thamsen, L., Verbitskiy, I., Nedelkoski, S., Tran, V. T., Meyer, V., Xavier, M. G., Kao, O. and De Rose, C. A. F. (2020) Hugo: A Cluster Scheduler that Efficiently Learns to Select Complementary Data-Parallel Jobs. In: Euro-Par 2019 International Workshops, Göttingen, Germany, 26-30 Aug 2019, pp. 519-530. ISBN 9783030483401 (doi: 10.1007/978-3-030-48340-1_40)

[img] Text
268144.pdf - Accepted Version
Restricted to Repository staff only

399kB

Abstract

Distributed data processing systems like MapReduce, Spark, and Flink are popular tools for analysis of large datasets with cluster resources. Yet, users often overprovision resources for their data processing jobs, while the resource usage of these jobs also typically fluctuates considerably. Therefore, multiple jobs usually get scheduled onto the same shared resources to increase the resource utilization and throughput of clusters. However, job runtimes and the utilization of shared resources can vary significantly depending on the specific combinations of co-located jobs. This paper presents Hugo, a cluster scheduler that continuously learns how efficiently jobs share resources, considering metrics for the resource utilization and interference among co-located jobs. The scheduler combines offline grouping of jobs with online reinforcement learning to provide a scheduling mechanism that efficiently generalizes from specific monitored job combinations yet also adapts to changes in workloads. Our evaluation of a prototype shows that the approach can reduce the runtimes of exemplary Spark jobs on a YARN cluster by up to 12.5%, while resource utilization is increased and waiting times can be bounded.

Item Type:Conference Proceedings
Additional Information:Funding: This work has been supported through grants by the German Ministry for Education and Research (BMBF; funding mark 01IS14013A and 01IS18025A).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Thamsen, Dr Lauritz
Authors: Thamsen, L., Verbitskiy, I., Nedelkoski, S., Tran, V. T., Meyer, V., Xavier, M. G., Kao, O., and De Rose, C. A. F.
College/School:College of Science and Engineering > School of Computing Science
Publisher:Springer
ISBN:9783030483401
Published Online:29 May 2020
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record