Herb, T., Thamsen, L., Renner, T. and Kao, O. (2016) Aura: A Flexible Dataflow Engine for Scalable Data Processing. In: 9th International Workshop on Parallel Tools for High Performance Computing, Dresden, Germany, pp. 117-126. ISBN 9783319395890 (doi: 10.1007/978-3-319-39589-0_9)
Text
268129.pdf - Accepted Version Restricted to Repository staff only 390kB |
Abstract
This paper describes Aura, a parallel dataflow engine for analysis of large-scale datasets on commodity clusters. Aura allows to compose program plans from relational operators and second-order functions, provides automatic program parallelization and optimization, and is a scalable and efficient runtime. Furthermore, Aura provides dedicated support for control flow, allowing advanced analysis programs to be executed as a single dataflow job. This way, it is not necessary to express, for example, data preprocessing, iterative algorithms, or even logic that depends on the outcome of a preceding dataflow as multiple separate jobs. The entire dataflow program is instead handled as one job by the engine, allowing to keep intermediate results in-memory and to consider the entire program during plan optimization to, for example, re-use partitions.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | Funding: This work has been supported through grants by the German Science Foundation (DFG) as FOR 1306 Stratosphere and by the German Ministry for Education and Research as Berlin Big Data Center BBDC (funding mark 01IS14013A). |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Thamsen, Dr Lauritz |
Authors: | Herb, T., Thamsen, L., Renner, T., and Kao, O. |
College/School: | College of Science and Engineering > School of Computing Science |
Publisher: | Springer |
ISBN: | 9783319395890 |
University Staff: Request a correction | Enlighten Editors: Update this record