Better Training using Weight-Constrained Stochastic Dynamics

Leimkuhler, B., Vlaar, T. , Pouchon, T. and Storkey, A. (2021) Better Training using Weight-Constrained Stochastic Dynamics. In: 38th International Conference on Machine Learning (ICML2021), 18-24 July 2022, pp. 6200-6211.

[img] Text
310177.pdf - Published Version
Available under License Creative Commons Attribution.

7MB

Publisher's URL: https://proceedings.mlr.press/v139/leimkuhler21a.html

Abstract

We employ constraints to control the parameter space of deep neural networks throughout training. The use of customised, appropriately designed constraints can reduce the vanishing/exploding gradients problem, improve smoothness of classification boundaries, control weight magnitudes and stabilize deep neural networks, and thus enhance the robustness of training algorithms and the generalization capabilities of neural networks. We provide a general approach to efficiently incorporate constraints into a stochastic gradient Langevin framework, allowing enhanced exploration of the loss landscape. We also present specific examples of constrained training methods motivated by orthogonality preservation for weight matrices and explicit weight normalizations. Discretization schemes are provided both for the overdamped formulation of Langevin dynamics and the underdamped form, in which momenta further improve sampling efficiency. These optimisation schemes can be used directly, without needing to adapt neural network architecture design choices or to modify the objective with regularization terms, and see performance improvements in classification tasks.

Item Type:Conference Proceedings
Additional Information:Benedict Leimkuhler is a fellow of the Alan Turing Institute which is supported by EPSRC grant EP/N510129/1. During the creation of this paper Timothee Pouchon was supported by the Swiss National Science Foundation, project P2ELP2 188037. Tiffany Vlaar is supported by The Maxwell Institute Graduate School in Analysis and its Applications, a Centre for Doctoral Training funded by the UK Engineering and Physical Sciences Research Council (grant EP/L016508/01), the Scottish Funding Council, Heriot-Watt University and the University of Edinburgh.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Vlaar, Dr Tiffany
Authors: Leimkuhler, B., Vlaar, T., Pouchon, T., and Storkey, A.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Mathematics
ISSN:2640-3498
Copyright Holders:Copyright © 2021 The Author(s)
First Published:First published in Proceedings of the 38th International Conference on Machine Learning 139:6200-6211
Publisher Policy:Reproduced under a Creative Commons license

University Staff: Request a correction | Enlighten Editors: Update this record