Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data

Fang, A., Macdonald, C. , Ounis, I. and Habel, P. (2016) Using Word Embedding to Evaluate the Coherence of Topics from Twitter Data. In: SIGIR 2016, Pisa, Italy, 17-21 Jul 2016, ISBN 9781450340694 (doi: 10.1145/2911451.2914729)

[img]
Preview
Text
119284.pdf - Accepted Version

313kB

Abstract

Scholars often seek to understand topics discussed on Twitter using topic modelling approaches. Several coherence metrics have been proposed for evaluating the coherence of the topics generated by these approaches, including the pre-calculated Pointwise Mutual Information (PMI) of word pairs and the Latent Semantic Analysis (LSA) word representation vectors. As Twitter data contains abbreviations and a number of peculiarities (e.g. hashtags), it can be challenging to train effective PMI data or LSA word representation. Recently, Word Embedding (WE) has emerged as a particularly effective approach for capturing the similarity among words. Hence, in this paper, we propose new Word Embedding-based topic coherence metrics. To determine the usefulness of these new metrics, we compare them with the previous PMI/LSA-based metrics. We also conduct a large-scale crowdsourced user study to determine whether the new Word Embedding-based metrics better align with human preferences. Using two Twitter datasets, our results show that the WE-based metrics can capture the coherence of topics in tweets more robustly and efficiently than the PMI/LSA-based ones.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Habel, Dr Philip and Ounis, Professor Iadh
Authors: Fang, A., Macdonald, C., Ounis, I., and Habel, P.
College/School:College of Science and Engineering > School of Computing Science
College of Social Sciences > School of Social and Political Sciences > Politics
ISBN:9781450340694
Copyright Holders:Copyright © 2016 Association for Computing Machinery
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record