Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy

Bangash, Y. A., Rana, T., Abbas, H., Imran, M. A. and Khan, A. A. (2019) Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy. IEEE Access, 7, pp. 10718-10733. (doi:10.1109/ACCESS.2019.2891264)

[img]
Preview
Text
177062.pdf - Published Version
Available under License Creative Commons Attribution.

3MB

Abstract

Incast is a phenomenon when multiple devices interact with only one device at a given time. Multiple storage senders overflow either the switch buffer or the single-receiver memory. This pattern causes all concurrent-senders to stop and wait for buffer/memory availability, and leads to a packet loss and retransmission—resulting in a huge latency. We present a software-defined technique tackling the many-to-one communication pattern—Incast—in a data center storage cluster. Our proposed method decouples the default TCP windowing mechanism from all storage servers, and delegates it to the software-defined storage controller. The proposed method removes the TCP saw-tooth behavior, provides a global flow awareness, and implements the dynamic fair-share buffer policy for end-to-end I/O path. It considers all I/O stages (applications, device drivers, NICs, switches/routers, file systems, I/O schedulers, main memory, and physical disks) while achieving the maximum I/O throughput. The policy, which is part of the proposed method, allocates fair-share bandwidth utilization for all storage servers. Priority queues are incorporated to handle the most important data flows. In addition, the proposed method provides better manageability and maintainability compared with traditional storage networks, where data plane and control plane reside in the same device.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Imran, Professor Muhammad
Authors: Bangash, Y. A., Rana, T., Abbas, H., Imran, M. A., and Khan, A. A.
College/School:College of Science and Engineering > School of Engineering > Systems Power and Energy
Journal Name:IEEE Access
Publisher:IEEE
ISSN:2169-3536
ISSN (Online):2169-3536
Published Online:07 January 2019
Copyright Holders:Copyright © 2019 IEEE
First Published:First published in IEEE Access 7: 10718-10733
Publisher Policy:Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
3007250Distributed Autonomous Resilient Emergency Management System (DARE)Muhammad ImranEngineering and Physical Sciences Research Council (EPSRC)EP/P028764/1ENG - Systems Power & Energy