Normalization Layers Are All That Sharpness-Aware Minimization Needs

Müller, M., Vlaar, T. , Rolnick, D. and Hein, M. (2024) Normalization Layers Are All That Sharpness-Aware Minimization Needs. In: 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, Louisiana, USA, 10-16 December 2023,

[img] Text
310174.pdf - Published Version

739kB

Publisher's URL: https://proceedings.neurips.cc/paper_files/paper/2023/hash/da909fc3893d272f26fd9db82e09d954-Abstract-Conference.html

Abstract

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.

Item Type:Conference Proceedings
Additional Information:We acknowledge support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (EXC number 2064/1, Project number 390727645), as well as from the Carl Zeiss Foundation in the project "Certification and Foundations of Safe Machine Learning Systems in Healthcare". We also thank the European Laboratory for Learning and Intelligent Systems (ELLIS) for supporting Maximilian Müller. We are grateful for support from the Canada CIFAR AI Chairs Program and US National Science Foundation award tel:1910864.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Vlaar, Dr Tiffany
Authors: Müller, M., Vlaar, T., Rolnick, D., and Hein, M.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Mathematics
Copyright Holders:Copyright © 2023 The Author(s)
First Published:First published in Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Publisher Policy:Reproduced in accordance with the publisher copyright policy
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record