Generalization Bound of Gradient Descent for Non-Convex Metric Learning

Dong, M., Yang, X. , Zhu, R., Wang, Y. and Xue, J.-H. (2020) Generalization Bound of Gradient Descent for Non-Convex Metric Learning. In: Thirty-Fourth Conference on Neural Information Processing Systems (NeurIPS 2020), 06-12 Dec 2020, (In Press)

[img] Text
225912.pdf - Accepted Version
Restricted to Repository staff only

1MB

Publisher's URL: https://proceedings.neurips.cc/paper/2020/hash/6f5e4e86a87220e5d361ad82f1ebc335-Abstract.html

Abstract

Metric learning aims to learn a distance measure that can benefit distance-based methods such as the nearest neighbour (NN) classifier. While considerable efforts have been made to improve its empirical performance and analyze its generalization ability by focusing on the data structure and model complexity, an unresolved question is how choices of algorithmic parameters, such as the number of training iterations, affect metric learning as it is typically formulated as an optimization problem and nowadays more often as a non-convex problem. In this paper, we theoretically address this question and prove the agnostic Probably Approximately Correct (PAC) learnability for metric learning algorithms with non-convex objective functions optimized via gradient descent (GD); in particular, our theoretical guarantee takes the iteration number into account. We first show that the generalization PAC bound is a sufficient condition for agnostic PAC learnability and this bound can be obtained by ensuring the uniform convergence on a densely concentrated subset of the parameter space. We then show that, for classifiers optimized via GD, their generalizability can be guaranteed if the classifier and loss function are both Lipschitz smooth, and further improved by using fewer iterations. To illustrate and exploit the theoretical findings, we finally propose a novel metric learning method called Smooth Metric and representative Instance LEarning (SMILE), designed to satisfy the Lipschitz smoothness property and learned via GD with an early stopping mechanism for better discriminability and less computational cost of NN.

Item Type:Conference Proceedings
Status:In Press
Refereed:Yes
Glasgow Author(s) Enlighten ID:Yang, Ms Xiaochen
Authors: Dong, M., Yang, X., Zhu, R., Wang, Y., and Xue, J.-H.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record