Müller, M., Vlaar, T. , Rolnick, D. and Hein, M. (2024) Normalization Layers Are All That Sharpness-Aware Minimization Needs. In: 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, Louisiana, USA, 10-16 December 2023,
Text
310174.pdf - Published Version 739kB |
Publisher's URL: https://proceedings.neurips.cc/paper_files/paper/2023/hash/da909fc3893d272f26fd9db82e09d954-Abstract-Conference.html
Abstract
Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.This finding generalizes to different SAM variants and both ResNet (Batch Normalization) and Vision Transformer (Layer Normalization) architectures. We consider alternative sparse perturbation approaches and find that these do not achieve similar performance enhancement at such extreme sparsity levels, showing that this behaviour is unique to the normalization layers. Although our findings reaffirm the effectiveness of SAM in improving generalization performance, they cast doubt on whether this is solely caused by reduced sharpness.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | We acknowledge support from the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy (EXC number 2064/1, Project number 390727645), as well as from the Carl Zeiss Foundation in the project "Certification and Foundations of Safe Machine Learning Systems in Healthcare". We also thank the European Laboratory for Learning and Intelligent Systems (ELLIS) for supporting Maximilian Müller. We are grateful for support from the Canada CIFAR AI Chairs Program and US National Science Foundation award tel:1910864. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Vlaar, Dr Tiffany |
Authors: | Müller, M., Vlaar, T., Rolnick, D., and Hein, M. |
College/School: | College of Science and Engineering > School of Mathematics and Statistics > Mathematics |
Copyright Holders: | Copyright © 2023 The Author(s) |
First Published: | First published in Advances in Neural Information Processing Systems 36 (NeurIPS 2023) |
Publisher Policy: | Reproduced in accordance with the publisher copyright policy |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record