Evaluating Similarity Metrics for Latent Twitter Topics

Wang, X., Fang, A., Ounis, I. and Macdonald, C. (2019) Evaluating Similarity Metrics for Latent Twitter Topics. In: 41st European Conference on Information Retrieval (ECIR 2019), Cologne, Germany, 14-18 Apr 2019, pp. 787-794. ISBN 9783030157128 (doi: 10.1007/978-3-030-15712-8_54)

174725.pdf - Accepted Version



Topic modelling approaches such as LDA, when applied on a tweet corpus, can often generate a topic model containing redundant topics. To evaluate the quality of a topic model in terms of redundancy, topic similarity metrics can be applied to estimate the similarity among topics in a topic model. There are various topic similarity metrics in the literature, e.g. the Jensen Shannon (JS) divergence-based metric. In this paper, we evaluate the performances of four distance/divergence-based topic similarity metrics and examine how they align with human judgements, including a newly proposed similarity metric that is based on computing word semantic similarity using word embeddings (WE). To obtain human judgements, we conduct a user study through crowdsourcing. Among various insights, our study shows that in general the cosine similarity (CS) and WE-based metrics perform better and appear to be complementary. However, we also find that the human assessors cannot easily distinguish between the distance/divergence-based and the semantic similarity-based metrics when identifying similar latent Twitter topics.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Wang, Xi and Fang, Mr Anjie and Ounis, Professor Iadh
Authors: Wang, X., Fang, A., Ounis, I., and Macdonald, C.
College/School:College of Science and Engineering
College of Science and Engineering > School of Computing Science
Copyright Holders:Copyright © 2019 Springer Nature Switzerland AG
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record