Visually Programming Dataflows for Distributed Data Analytics

Thamsen, L., Renner, T., Byfeld, M., Paeschke, M., Schröder, D. and Böhm, F. (2017) Visually Programming Dataflows for Distributed Data Analytics. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 05-08 Dec 2016, pp. 2276-2285. ISBN 9781467390057 (doi: 10.1109/BigData.2016.7840860)

[img] Text
ThamsenRennerByfeldPaeschkeSchroederBoehm_2016_VisuallyProgrammingDataflowsForDistributedDataAnalytics.pdf - Accepted Version
Restricted to Repository staff only

1MB

Abstract

Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.

Item Type:Conference Proceedings
Additional Information:Funding: This work has been supported through grants by the German Science Foundation (DFG) as FOR 1306 Stratosphere and by the German Ministry for Education and Research (BMBF) as Berlin Big Data Center BBDC (funding mark 01IS14013A).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Thamsen, Dr Lauritz
Authors: Thamsen, L., Renner, T., Byfeld, M., Paeschke, M., Schröder, D., and Böhm, F.
College/School:College of Science and Engineering > School of Computing Science
Publisher:IEEE
ISBN:9781467390057

University Staff: Request a correction | Enlighten Editors: Update this record