Generating Performance Portable Code Using Rewrite Rules: from High-Level Functional Expressions to High-Performance OpenCL Code

Steuwer, M. , Fensch, C., Lindley, S. and Dubach, C. (2015) Generating Performance Portable Code Using Rewrite Rules: from High-Level Functional Expressions to High-Performance OpenCL Code. In: ICFP 2015 Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming, Vancouver, BC, Canada, 31 Aug - 02 Sep 2015, pp. 205-217. ISBN 9781450336697 (doi:10.1145/2784731.2784754)

[img]
Preview
Text
146605.pdf - Accepted Version

620kB

Abstract

Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability at the expense of performance. We propose a novel approach aiming to combine high-level programming, code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation, close to the OpenCL programming model, from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations. We formalize our system with a core dependently-typed λ-calculus along with a denotational semantics which we use to prove the correctness of the rewrite rules. We test our design in practice by implementing a compiler which generates high performance imperative OpenCL code. Our experiments show that we can automatically derive hardware-specific implementations from simple functional high-level algorithmic expressions offering performance on a par with highly tuned code for multicore CPUs and GPUs written by experts.

Item Type:Conference Proceedings
Additional Information:This work was supported by a HiPEAC collaboration grant, EPSRC (grant number EP/K034413/1), the Royal Academy of Engineering, Google and Oracle.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Steuwer, Dr Michel
Authors: Steuwer, M., Fensch, C., Lindley, S., and Dubach, C.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450336697
Copyright Holders:Copyright © 2015 ACM
First Published:First published in ICFP 2015 Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming: 205-217
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record