Enlighten Publications

In this section

Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models

Shi, Y. et al. (2023) Train Faster, Perform Better: Modular Adaptive Training in Over-Parameterized Models. In: 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, Louisiana, USA, 10-16 Dec 2023,

Text
316900.pdf - Accepted Version
2MB

Publisher's URL: https://proceedings.neurips.cc/paper_files/paper/2023/hash/516fd05dc408fd6d6374940a83930193-Abstract-Conference.html

Abstract

Despite their prevalence in deep-learning communities, over-parameterized models convey high demands of computational costs for proper training. This work studies the fine-grained, modular-level learning dynamics of over-parameterized models to attain a more efficient and fruitful training strategy. Empirical evidence reveals that when scaling down into network modules, such as heads in self-attention models, we can observe varying learning patterns implicitly associated with each module's trainability. To describe such modular-level learning capabilities, we introduce a novel concept dubbed modular neural tangent kernel (mNTK), and we demonstrate that the quality of a module's learning is tightly associated with its mNTK's principal eigenvalue λ max . A large λ max indicates that the module learns features with better convergence, while those miniature ones may impact generalization negatively. Inspired by the discovery, we propose a novel training strategy termed Modular Adaptive Training (MAT) to update those modules with their λ max exceeding a dynamic threshold selectively, concentrating the model on learning common features and ignoring those inconsistent ones. Unlike most existing training schemes with a complete BP cycle across all network modules, MAT can significantly save computations by its partially-updating strategy and can further improve performance. Experiments show that MAT nearly halves the computational cost of model training and outperforms the accuracy of baselines.

Item Type:	Conference Proceedings
Additional Information:	This work was supported by National Natural Science Foundation of China under Grant No. 62090025, National Key Rand D Program of China under Grant No. 2022YFB4400400 and China Postdoctoral Science Foundation No. 2022M720767.
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Yang, Dr Xiaochen
Authors:	Shi, Y., Chen, Y., Dong, M., Yang, X., Li, D., Wang, Y., Dick, R. P., Lv, Q., Zhao, Y., Yang, F., Lu, T., Gu, N., and Shang, L.
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:	College of Science and Engineering > School of Mathematics and Statistics > Statistics
Copyright Holders:	Copyright © The Author(s) 2023
First Published:	First published in Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Publisher Policy:	Reproduced with the permission of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	316900
Depositing User:	Dr Xiaochen Yang
Datestamp:	23 Jan 2024 12:19
Last Modified:	23 Apr 2024 14:30
Date of acceptance:	21 September 2023
Date of first online publication:	2023
Date Deposited:	23 January 2024
Data Availability Statement:	No