Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip

Yu, Q., Cano, J. , Flich, J. and Ampadu, P. (2012) Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip. In: 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, Copenhagen, Denmark, 09-11 May 2012, pp. 169-176. ISBN 9781467309738 (doi:10.1109/NOCS.2012.27)

High-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, we propose a combined method to address permanent and transient link and router failures in those systems. The LBDRhr mechanism is proposed to tolerate permanent link failures in some popular high-radix topologies. The increased router complexity may lead to more transient router errors than routers using simple XY routing algorithm. We exploit the inherent information redundancy (IIR) in LBDRhr logic to manage transient errors in the network routers. Thorough analyses are provided to discover the appropriate internal nodes and the forbidden signal patterns for transient error detection. Simulation results show that LBDRhr logic can tolerate all of the permanent failure combinations of long-range links and 80% of links failures at short-range links. Case studies show that the error detection method based on the new IIR extraction method reduces the power consumption and the residual error rate by 33% and up to two orders of magnitude, respectively, compared to triple modular redundancy. The impact of network topologies on the efficiency of the detection mechanism has been examined in this work, as well.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Cano Reyes, Dr Jose
Authors: Yu, Q., Cano, J., Flich, J., and Ampadu, P.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Proceedings of the 2012 6th IEEE/ACM International Symposium on Networks-on-Chip, NoCS 2012
Published Online:01 June 2012
Copyright Holders:Copyright © 2012 IEEE
First Published:First published in 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip: 169-176
Publisher Policy:Reproduced in accordance with the publisher copyright policy

