Enhancing First Story Detection using Word Embeddings

Moran, S., Mccreadie, R., Macdonald, C. and Ounis, I. (2016) Enhancing First Story Detection using Word Embeddings. In: SIGIR 2016, Pisa, Italy, 17-21 July 2016, pp. 821-824. ISBN 9781450340694 (doi:10.1145/2911451.2914719)

[img]
Preview
Text
119287.pdf - Accepted Version

119kB

Abstract

In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh and Moran, Mr Sean and Mccreadie, Mr Richard
Authors: Moran, S., Mccreadie, R., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450340694
Copyright Holders:Copyright © 2016 Association for Computing Machinery
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
651921Urban Big Data Research CentrePiyushimita ThakuriahEconomic & Social Research Council (ESRC)ES/L011921/1SPS - URBAN STUDIES
651922Urban Big Data Research CentrePiyushimita ThakuriahEconomic & Social Research Council (ESRC)ES/L011921/1SPS - URBAN STUDIES
624701SUPERIadh OunisEuropean Commission (EC)606853COM - COMPUTING SCIENCE