Steuwer, M. , Haidl, M., Breuer, S. and Gorlatch, S. (2014) High-level programming of stencil computations on multi-GPU systems using the SkelCL library. Parallel Processing Letters, 24(3), 1441005. (doi: 10.1142/S0129626414410059)
|
Text
148974.pdf - Accepted Version 1MB |
Abstract
The implementation of stencil computations on modern, massively parallel systems with GPUs and other accelerators currently relies on manually-tuned coding using low-level approaches like OpenCL and CUDA. This makes development of stencil applications a complex, time-consuming, and error-prone task. We describe how stencil computations can be programmed in our SkelCL approach that combines high-level programming abstractions with competitive performance on multi-GPU systems. SkelCL extends the OpenCL standard by three high-level features: 1) pre-implemented parallel patterns (a.k.a. skeletons); 2) container data types for vectors and matrices; 3) automatic data (re)distribution mechanism. We introduce two new SkelCL skeletons which specifically target stencil computations – MapOverlap and Stencil – and we describe their use for particular application examples, discuss their efficient parallel implementation, and report experimental results on systems with multiple GPUs. Our evaluation of three real-world applications shows that stencil code written with SkelCL is considerably shorter and offers competitive performance to hand-tuned OpenCL code.
Item Type: | Articles |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Steuwer, Dr Michel |
Authors: | Steuwer, M., Haidl, M., Breuer, S., and Gorlatch, S. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
College/School: | College of Science and Engineering > School of Computing Science |
Journal Name: | Parallel Processing Letters |
Publisher: | World Scientific Publishing |
ISSN: | 0129-6264 |
ISSN (Online): | 1793-642X |
Copyright Holders: | Copyright © 2014 World Scientific Publishing Company |
First Published: | First published in Parallel Processing Letters 24(3): 1441005 |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
University Staff: Request a correction | Enlighten Editors: Update this record