Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC

Bohm Agostini, N., Dong, S., Elmira, K., Marti, T. L., Cano, J. , Abellán, J. L. and Kaeli, D. (2020) Design Space Exploration of Accelerators and End-to-End DNN Evaluation with TFLITE-SOC. In: 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), Porto, Portugal, 9-11 Sept. 2020, pp. 10-19. ISBN 9781728199245 (doi: 10.1109/SBAC-PAD49847.2020.00013)

[img] Text
226488.pdf - Accepted Version

842kB

Abstract

Recently there has been a rapidly growing demand for faster machine learning (ML) processing in data centers and migration of ML inference applications to edge devices. These developments have prompted both industry and academia to explore custom accelerators to optimize ML executions for performance and power. However, identifying which accelerator is best equipped for performing a particular ML task is challenging, especially given the growing range of ML tasks, the number of target environments, and the limited number of integrated modeling tools. To tackle this issue, it is of paramount importance to provide the computer architecture research community with a common framework capable of performing a comprehensive, uniform, and fair comparison across different accelerator designs targeting a particular ML task. To this aim, we propose a new framework named TFLITESOC (System On Chip) that integrates a lightweight system modeling library (SystemC) for fast design space exploration of custom ML accelerators into the build/execution environment of Tensorflow Lite (TFLite), a highly popular ML framework for ML inference. Using this approach, we are able to model and evaluate new accelerators developed in SystemC by leveraging the language’s hierarchical design capabilities, resulting in faster design prototyping. Furthermore, any accelerator designed using TFLITE-SOC can be benchmarked for inference with any DNN model compatible with TFLite, which enables end-to-end DNN processing and detailed (i.e., per DNN layer) performance analysis. In addition to providing rapid prototyping, integrated benchmarking, and a range of platform configurations, TFLITESOC offers comprehensive performance analysis of accelerator occupancy and execution time breakdown as well as a rich set of modules that can be used by new accelerators to implement scaling up studies and optimized memory transfer protocols. We present our framework and demonstrate its utility by considering the design space of a TPU-like systolic array and describing possible directions for optimization. Using a compression technique, we implement an optimization targeting reducing the memory traffic between DRAM and on-device buffers. Compared to the baseline accelerator, our optimized design shows up to 1.26x speedup on accelerated operations and up to 1.19x speedup on end-to-end DNN execution.

Item Type:Conference Proceedings
Keywords:DNN accelerator framework, systolic array, memory compression, hardware-software co-design.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Cano Reyes, Dr Jose
Authors: Bohm Agostini, N., Dong, S., Elmira, K., Marti, T. L., Cano, J., Abellán, J. L., and Kaeli, D.
College/School:College of Science and Engineering > School of Computing Science
ISSN:2643-3001
ISBN:9781728199245
Copyright Holders:Copyright © 2020 IEEE
First Published:First published in 2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record