Runtime Code Generation and Data Management for Heterogeneous Computing in Java

Fumero, J., Remmelg, T., Steuwer, M. and Dubach, C. (2015) Runtime Code Generation and Data Management for Heterogeneous Computing in Java. In: PPPJ '15 Proceedings of the Principles and Practices of Programming on The Java Platform, Melbourne, FL, USA, 08-11 Sep 2015, pp. 16-26. ISBN 9781450337120 (doi: 10.1145/2807426.2807428)

[img]
Preview
Text
146606.pdf - Accepted Version

517kB

Abstract

GPUs (Graphics Processing Unit) and other accelerators are nowadays commonly found in desktop machines, mobile devices and even data centres. While these highly parallel processors offer high raw performance, they also dramatically increase program complexity, requiring extra effort from programmers. This results in difficult-to-maintain and non-portable code due to the low-level nature of the languages used to program these devices. This paper presents a high-level parallel programming approach for the popular Java programming language. Our goal is to revitalise the old Java slogan – Write once, run anywhere — in the context of modern heterogeneous systems. To enable the use of parallel accelerators from Java we introduce a new API for heterogeneous programming based on array and functional programming. Applications written with our API can then be transparently accelerated on a device such as a GPU using our runtime OpenCL code generator. In order to ensure the highest level of performance, we present data management optimizations. Usually, data has to be translated (marshalled) between the Java representation and the representation accelerators use. This paper shows how marshal affects runtime and present a novel technique in Java to avoid this cost by implementing our own customised array data structure. Our design hides low level data management from the user making our approach applicable even for inexperienced Java programmers. We evaluated our technique using a set of applications from different domains, including mathematical finance and machine learning. We achieve speedups of up to 500x over sequential and multi-threaded Java code when using an external GPU.

Item Type:Conference Proceedings
Additional Information:The authors would like to thank Oracle Labs for their support of this work.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Steuwer, Dr Michel
Authors: Fumero, J., Remmelg, T., Steuwer, M., and Dubach, C.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450337120
Copyright Holders:Copyright © 2015 ACM
First Published:First published in PPPJ '15 Proceedings of the Principles and Practices of Programming on The Java Platform: 16-26
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record