Guo, G., Ouyang, S., Yuan, F. and Wang, X. (2018) Approximating Word Ranking and Negative Sampling for Word Embedding. In: Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13-19 Jul 2018, pp. 4092-4098. ISBN 9780999241127 (doi: 10.24963/ijcai.2018/569)
|
Text
165128.pdf - Accepted Version 431kB |
Abstract
CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank.
Item Type: | Conference Proceedings |
---|---|
Additional Information: | This work was supported by the National Natural Science Foundation for Young Scientists of China under Grant No. (61702084, 61772125, 61702090) and the Fundamental Research Funds for the Central Universities under Grant No.N161704001. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | YUAN, FAJIE |
Authors: | Guo, G., Ouyang, S., Yuan, F., and Wang, X. |
College/School: | College of Science and Engineering |
Publisher: | International Joint Conferences on Artificial Intelligence Organization |
ISBN: | 9780999241127 |
Copyright Holders: | Copyright © 2018 IJCAI |
First Published: | First published in the Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18): 4092-4098 |
Publisher Policy: | Reproduced with the permission of the Editor |
University Staff: Request a correction | Enlighten Editors: Update this record