Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications

Scheinert, D., Becker, S., Will, J., Englaender, L. and Thamsen, L. (2024) Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications. In: 2023 IEEE International Conference on Big Data (Big Data), Sorrento, Italy, 15-18 Dec 2023, pp. 2339-2345. ISBN 9798350324457 (doi: 10.1109/BigData59044.2023.10386195)

[img] Text
310190.pdf - Accepted Version

392kB

Abstract

Performance modeling can help to improve the resource efficiency of clusters and distributed dataflow applications, yet the available modeling data is often limited. Collaborative approaches to performance modeling, characterized by the sharing of performance data or models, have been shown to improve resource efficiency, but there has been little focus on actual data sharing strategies and implementation in production environments. This missing building block holds back the realization of proposed collaborative solutions.In this paper, we envision, design, and evaluate a peer-to-peer performance data sharing approach for collaborative performance modeling of distributed dataflow applications. Our proposed data distribution layer enables access to performance data in a decentralized manner, thereby facilitating collaborative modeling approaches and allowing for improved prediction capabilities and hence increased resource efficiency. In our evaluation, we assess our approach with regard to deployment, data replication, and data validation, through experiments with a prototype implementation and simulation, demonstrating feasibility and allowing discussion of potential limitations and next steps.

Item Type:Conference Proceedings
Additional Information:This work has been supported through grants by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) as C5 (grant 506529034) and as FONDA (Project 414984028, SFB 1404). The experimental setup code is available at https://github.com/mcd01/test-plans
Keywords:Scalable data analytics, distributed dataflows, performance modeling, data sharing, resource management.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Thamsen, Dr Lauritz
Authors: Scheinert, D., Becker, S., Will, J., Englaender, L., and Thamsen, L.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9798350324457
Copyright Holders:Copyright © 2023, IEEE
First Published:First published in 2023 IEEE International Conference on Big Data (BigData)
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record