Improving the network scalability of Erlang

Chechina, N., Li, H., Ghaffari, A., Thompson, S. and Trinder, P. (2016) Improving the network scalability of Erlang. Journal of Parallel and Distributed Computing, 90-91, pp. 22-34. (doi: 10.1016/j.jpdc.2016.01.002)

[img]
Preview
Text
108603.pdf - Accepted Version

632kB

Abstract

As the number of cores grows in commodity architectures so does the likelihood of failures. A distributed actor model potentially facilitates the development of reliable and scalable software on these architectures. Key components include lightweight processes which ‘share nothing’ and hence can fail independently. Erlang is not only increasingly widely used, but the underlying actor model has been a beacon for programming language design, influencing for example Scala, Clojure and Cloud Haskell. While the Erlang distributed actor model is inherently scalable, we demonstrate that it is limited by some pragmatic factors. We address two network scalability issues here: globally registered process names must be updated on every node (virtual machine) in the system, and any Erlang nodes that communicate maintain an active connection. That is, there is a fully connected O(n2) network of n nodes. We present the design, implementation, and initial evaluation of a conservative extension of Erlang — Scalable Distributed (SD) Erlang. SD Erlang partitions the global namespace and connection network using s_groups. An s_group is a set of nodes with its own process namespace and with a fully connected network within the s_group, but only individual connections outside it. As a node may belong to more than one s_group it is possible to construct arbitrary connection topologies like trees or rings. We present an operational semantics for the s_group functions, and outline the validation of conformance between the implementation and the semantics using the QuickCheck automatic testing tool. Our preliminary evaluation in comparison with distributed Erlang shows that SD Erlang dramatically improves network scalability even if the number of global operations is tiny (0.01%). Moreover, even in the absence of global operations the reduced connection maintenance overheads mean that SD Erlang scales better beyond 80 nodes (1920 cores).

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Chechina, Dr Natalia and Ghaffari, Mr Amir and Trinder, Professor Phil
Authors: Chechina, N., Li, H., Ghaffari, A., Thompson, S., and Trinder, P.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Journal of Parallel and Distributed Computing
Publisher:Elsevier
ISSN:0743-7315
ISSN (Online):1096-0848
Published Online:08 February 2016
Copyright Holders:Copyright © 2016 Elsevier
First Published:First published in Journal of Parallel and Distributed Computing 90-91: 22-34
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
635431RELEASEPhil TrinderEuropean Commission (EC)287510COM - COMPUTING SCIENCE