Xin, X., Yuan, F., He, X. and Jose, J. M. (2018) Batch IS NOT Heavy: LearningWord Representations From All Samples. 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15-20 July 2018.
|
Text
162396.pdf - Published Version Available under License Creative Commons Attribution. 466kB |
Publisher's URL: https://aclanthology.info/volumes/proceedings-of-the-56th-annual-meeting-of-the-association-for-computational-linguistics-volume-1-long-papers
Abstract
Stochastic Gradient Descent (SGD) with negative sampling is the most prevalent approach to learn word representations. However, it is known that sampling methods are biased especially when the sampling distribution deviates from the true data distribution. Besides, SGD suffers from dramatic fluctuation due to the onesample learning scheme. In this work, we propose AllVec that uses batch gradient learning to generate word representations from all training samples. Remarkably, the time complexity of AllVec remains at the same level as SGD, being determined by the number of positive samples rather than all samples. We evaluate AllVec on several benchmark tasks. Experiments show that AllVec outperforms samplingbased SGD methods with comparable efficiency, especially for small training corpora.
Item Type: | Conference or Workshop Item |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Jose, Professor Joemon and Xin, Xin and YUAN, FAJIE |
Authors: | Xin, X., Yuan, F., He, X., and Jose, J. M. |
College/School: | College of Science and Engineering College of Science and Engineering > School of Computing Science |
Copyright Holders: | Copyright © 2018 ACL |
First Published: | First published in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers) 2018:1853–1862 |
Publisher Policy: | Reproduced under a Creative Commons License |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record