The HdpH DSLs for scalable reliable computation

Maier, P., Stewart, R. J. and Trinder, P. (2014) The HdpH DSLs for scalable reliable computation. In: ACM SIGPLAN Haskell Symposium 2014, Gothenburg, Sweden, 4-5 Sep 2014, pp. 65-76. ISBN 9781450330411 (doi:10.1145/2633357.2633363)

Full text not currently available from Enlighten.

Abstract

The statelessness of functional computations facilitates both parallelism and fault recovery. Faults and non-uniform communication topologies are key challenges for emergent large scale parallel architectures. We report on HdpH and HdpH-RS, a pair of Haskell DSLs designed to address these challenges for irregular task-parallel computations on large distributed-memory architectures. Both DSLs share an API combining explicit task placement with sophisticated work stealing. HdpH focuses on scalability by making placement and stealing topology aware whereas HdpH-RS delivers reliability by means of fault tolerant work stealing. We present operational semantics for both DSLs and investigate conditions for semantic equivalence of HdpH and HdpH-RS programs, that is, conditions under which topology awareness can be transparently traded for fault tolerance. We detail how the DSL implementations realise topology awareness and fault tolerance. We report an initial evaluation of scalability and fault tolerance on a 256-core cluster and on up to 32K cores of an HPC platform.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Trinder, Professor Phil and Maier, Dr Patrick
Authors: Maier, P., Stewart, R. J., and Trinder, P.
College/School:College of Science and Engineering > School of Computing Science
Research Group:Glasgow Parallelism Group
ISBN:9781450330411
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
644791Adaptive Just-In-Time Parallelisation (AJITPar)Phil TrinderEngineering & Physical Sciences Research Council (EPSRC)EP/L000687/1COM - COMPUTING SCIENCE