Evaluating Similarity Metrics for Latent Twitter Topics

Wang, X., Fang, A., Ounis, I. and Macdonald, C. (2018) Evaluating Similarity Metrics for Latent Twitter Topics. In: 41st European Conference on Information Retrieval (ECIR 2019), Cologne, Germany, 14-18 Apr 2019, (Accepted for Publication)

[img] Text
174725.pdf - Accepted Version
Restricted to Repository staff only

224kB

Abstract

Topic modelling approaches such as LDA, when applied on a tweet corpus, can often generate a topic model containing redundant topics. To evaluate the quality of a topic model in terms of redundancy, topic similarity metrics can be applied to estimate the similarity among topics in a topic model. There are various topic similarity metrics in the literature, e.g. the Jensen Shannon (JS) divergence-based metric. In this paper, we evaluate the performances of four distance/divergence-based topic similarity metrics and examine how they align with human judgements, including a newly proposed similarity metric that is based on computing word semantic similarity using word embeddings (WE). To obtain human judgements, we conduct a user study through crowdsourcing. Among various insights, our study shows that in general the cosine similarity (CS) and WE-based metrics perform better and appear to be complementary. However, we also find that the human assessors cannot easily distinguish between the distance/divergence-based and the semantic similarity-based metrics when identifying similar latent Twitter topics.

Item Type:Conference Proceedings
Status:Accepted for Publication
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh and Wang, Xi and Fang, Mr Anjie
Authors: Wang, X., Fang, A., Ounis, I., and Macdonald, C.
College/School:College of Science and Engineering
College of Science and Engineering > School of Computing Science
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record