Metric Learning for Categorical and Ambiguous Features: An Adversarial Approach

Yang, X. , Dong, M., Guo, Y. and Xue, J.-H. (2021) Metric Learning for Categorical and Ambiguous Features: An Adversarial Approach. In: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2020), 14-18 Sep 2020, pp. 223-238. ISBN 9783030676605 (doi:10.1007/978-3-030-67661-2_14)

[img] Text
225910.pdf - Accepted Version



Metric learning learns a distance metric from data and has significantly improved the classification accuracy of distance-based classifiers such as k-nearest neighbors. However, metric learning has rarely been applied to categorical data, which are prevalent in health and social sciences, but inherently difficult to classify due to high feature ambiguity and small sample size. More specifically, ambiguity arises as the boundaries between ordinal or nominal levels are not always sharply defined. In this paper, we mitigate the impact of feature ambiguity by considering the worst-case perturbation of each instance and propose to learn the Mahalanobis distance through adversarial training. The geometric interpretation shows that our method dynamically divides the instance space into three regions and exploits the information on the “adversarially vulnerable” region. This information, which has not been considered in previous methods, makes our method more suitable than them for small-sized data. Moreover, we establish the generalization bound for a general form of adversarial training. It suggests that the sample complexity rate remains at the same order as that of standard training only if the Mahalanobis distance is regularized with the elementwise 1-norm. Experiments on ordinal and mixed ordinal-and-nominal datasets demonstrate the effectiveness of the proposed method when encountering the problems of high feature ambiguity and small sample size.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Yang, Dr Xiaochen
Authors: Yang, X., Dong, M., Guo, Y., and Xue, J.-H.
College/School:College of Science and Engineering > School of Mathematics and Statistics > Statistics
Published Online:25 February 2021
Copyright Holders:Copyright © 2021 Springer Nature Switzerland AG
First Published:First published in Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science, vol 12458: 223-238
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record