Thamsen, L., Renner, T., Byfeld, M., Paeschke, M., Schröder, D. and Böhm, F. (2017) Visually Programming Dataflows for Distributed Data Analytics. In: 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 05-08 Dec 2016, pp. 2276-2285. ISBN 9781467390057 (doi: 10.1109/BigData.2016.7840860)
Text
ThamsenRennerByfeldPaeschkeSchroederBoehm_2016_VisuallyProgrammingDataflowsForDistributedDataAnalytics.pdf - Accepted Version Restricted to Repository staff only 1MB |
Abstract
Distributed dataflow systems like Spark and Flink allow to analyze large datasets using clusters of computers. These frameworks provide automatic program parallelization and manage distributed workers, including worker failures. Moreover, they provide high-level programming abstractions and execute programs efficiently. Yet, the programming abstractions remain textual while the dataflow model is essentially a graph of transformations. Thus, there is a mismatch between the presented abstraction and the underlying model here. One can also argue that developing dataflow programs with these textual abstractions requires needless amounts of coding and coding skills. A dedicated programming environment could instead allow constructing dataflow programs more interactively and visually. In this paper, we therefore investigate how visual programming can make the development of parallel dataflow programs more accessible. In particular, we built a prototypical visual programming environment for Flink, which we call Flision. Flision provides a graphical user interface for creating dataflow programs, a code generation engine that generates code for Flink, and seamless deployment to a connected cluster. Users of this environment can effectively create jobs by dragging, dropping, and visually connecting operator components. To evaluate the applicability of this approach, we interviewed ten potential users. Our impressions from this qualitative user testing strengthened our believe that visual programming can be a valuable tool for users of scalable data analysis tools.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | Funding: This work has been supported through grants by the German Science Foundation (DFG) as FOR 1306 Stratosphere and by the German Ministry for Education and Research (BMBF) as Berlin Big Data Center BBDC (funding mark 01IS14013A). |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Thamsen, Dr Lauritz |
Authors: | Thamsen, L., Renner, T., Byfeld, M., Paeschke, M., Schröder, D., and Böhm, F. |
College/School: | College of Science and Engineering > School of Computing Science |
Publisher: | IEEE |
ISBN: | 9781467390057 |
University Staff: Request a correction | Enlighten Editors: Update this record