Alexandrov, A., Kunft, A., Katsifodimos, A., Schüler, F., Thamsen, L., Kao, O., Herb, T. and Markl, V. (2015) Implicit Parallelism Through Deep Language Embedding. In: International Conference on Management of Data (SIGMOD/PODS '15), Melbourne, Australia, 31 May - 04 Jun 2015, pp. 47-61. ISBN 9781450327589 (doi: 10.1145/2723372.2750543)
Text
268125.pdf - Accepted Version Restricted to Repository staff only 504kB |
Abstract
The appeal of MapReduce has spawned a family of systems that implement or extend it. In order to enable parallel collection processing with User-Defined Functions (UDFs), these systems expose extensions of the MapReduce programming model as library-based dataflow APIs that are tightly coupled to their underlying runtime engine. Expressing data analysis algorithms with complex data and control flow structure using such APIs reveals a number of limitations that impede programmer's productivity. In this paper we show that the design of data analysis languages and APIs from a runtime engine point of view bloats the APIs with low-level primitives and affects programmer's productivity. Instead, we argue that an approach based on deeply embedding the APIs in a host language can address the shortcomings of current data analysis languages. To demonstrate this, we propose a language for complex data analysis embedded in Scala, which (i) allows for declarative specification of dataflows and (ii) hides the notion of data-parallelism and distributed runtime behind a suitable intermediate representation. We describe a compiler pipeline that facilitates efficient data-parallel processing without imposing runtime engine-bound syntactic or semantic restrictions on the structure of the input programs. We present a series of experiments with two state-of-the-art systems that demonstrate the optimization potential of our approach.
Item Type: | Conference Proceedings |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Thamsen, Dr Lauritz |
Authors: | Alexandrov, A., Kunft, A., Katsifodimos, A., Schüler, F., Thamsen, L., Kao, O., Herb, T., and Markl, V. |
College/School: | College of Science and Engineering > School of Computing Science |
Publisher: | ACM |
ISBN: | 9781450327589 |
University Staff: Request a correction | Enlighten Editors: Update this record