Aura: A Flexible Dataflow Engine for Scalable Data Processing

Herb, T., Thamsen, L., Renner, T. and Kao, O. (2016) Aura: A Flexible Dataflow Engine for Scalable Data Processing. In: 9th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, pp. 117-126. ISBN 9783319395890 (doi: 10.1007/978-3-319-39589-0_9)

[img] Text
268129.pdf - Accepted Version
Restricted to Repository staff only

390kB

Abstract

This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated support for control flow, allowing advanced analysis programs to be executed as a single dataflow job. This way, it is not necessary to express, for example, data preprocessing, iterative algorithms, or even logic that depends on the outcome of a preceding dataflow as multiple separate jobs. The entire dataflow program is instead handled as one job by the engine, allowing to keep intermediate results in-memory and to consider the entire program during plan optimization to, for example, re-use partitions.

Item Type:Conference Proceedings
Additional Information:Funding: This work has been supported through grants by the German Science Foundation (DFG) as FOR 1306 Stratosphere and by the German Ministry for Education and Research as Berlin Big Data Center BBDC (funding mark 01IS14013A).
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Thamsen, Dr Lauritz
Authors: Herb, T., Thamsen, L., Renner, T., and Kao, O.
College/School:College of Science and Engineering > School of Computing Science
Publisher:Springer
ISBN:9783319395890

University Staff: Request a correction | Enlighten Editors: Update this record