Leimkuhler, B., Vlaar, T. , Pouchon, T. and Storkey, A. (2021) Better Training using Weight-Constrained Stochastic Dynamics. In: 38th International Conference on Machine Learning (ICML2021), 18-24 July 2022, pp. 6200-6211.
Text
310177.pdf - Published Version Available under License Creative Commons Attribution. 7MB |
Publisher's URL: https://proceedings.mlr.press/v139/leimkuhler21a.html
Abstract
We employ constraints to control the parameter space of deep neural networks throughout training. The use of customised, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimisation schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | Benedict Leimkuhler is a fellow of the Alan Turing Institute which is supported by EPSRC grant EP/N510129/1. During the creation of this paper Timothee Pouchon was supported by the Swiss National Science Foundation, project P2ELP2 188037. Tiffany Vlaar is supported by The Maxwell Institute Graduate School in Analysis and its Applications, a Centre for Doctoral Training funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016508/01), the Scottish Funding Council, Heriot-Watt University and the University of Edinburgh. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Vlaar, Dr Tiffany |
Authors: | Leimkuhler, B., Vlaar, T., Pouchon, T., and Storkey, A. |
College/School: | College of Science and Engineering > School of Mathematics and Statistics > Mathematics |
ISSN: | 2640-3498 |
Copyright Holders: | Copyright © 2021 The Author(s) |
First Published: | First published in Proceedings of the 38th International Conference on Machine Learning 139:6200-6211 |
Publisher Policy: | Reproduced under a Creative Commons license |
University Staff: Request a correction | Enlighten Editors: Update this record