Approximating Word Ranking and Negative Sampling for Word Embedding

Guo, G., Ouyang, S., Yuan, F. and Wang, X. (2018) Approximating Word Ranking and Negative Sampling for Word Embedding. In: Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), Stockholm, Sweden, 13-19 Jul 2018, pp. 4092-4098. ISBN 9780999241127 (doi: 10.24963/ijcai.2018/569)

[img]
Preview
Text
165128.pdf - Accepted Version

431kB

Abstract

CBOW (Continuous Bag-Of-Words) is one of the most commonly used techniques to generate word embeddings in various NLP tasks. However, it fails to reach the optimal performance due to uniform involvements of positive words and a simple sampling distribution of negative words. To resolve these issues, we propose OptRank to optimize word ranking and approximate negative sampling for bettering word embedding. Specifically, we first formalize word embedding as a ranking problem. Then, we weigh the positive words by their ranks such that highly ranked words have more importance, and adopt a dynamic sampling strategy to select informative negative words. In addition, an approximation method is designed to efficiently compute word ranks. Empirical experiments show that OptRank consistently outperforms its counterparts on a benchmark dataset with different sampling scales, especially when the sampled subset is small. The code and datasets can be obtained from https://github.com/ouououououou/OptRank.

Item Type:Conference Proceedings
Additional Information:This work was supported by the National Natural Science Foundation for Young Scientists of China under Grant No. (61702084, 61772125, 61702090) and the Fundamental Research Funds for the Central Universities under Grant No.N161704001.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:YUAN, FAJIE
Authors: Guo, G., Ouyang, S., Yuan, F., and Wang, X.
College/School:College of Science and Engineering
Publisher:International Joint Conferences on Artificial Intelligence Organization
ISBN:9780999241127
Copyright Holders:Copyright © 2018 IJCAI
First Published:First published in the Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18): 4092-4098
Publisher Policy:Reproduced with the permission of the Editor

University Staff: Request a correction | Enlighten Editors: Update this record