On Refining Twitter Lists as Ground Truth Data for Multi-Community User Classification

Su, T., Fang, A., McCreadie, R. , Macdonald, C. and Ounis, I. (2018) On Refining Twitter Lists as Ground Truth Data for Multi-Community User Classification. In: 40th European Conference on Information Retrieval (ECIR 2018), Grenoble, France, 25-29 Mar 2018, pp. 765-772. ISBN 9783319769400 (doi: 10.1007/978-3-319-76941-7_74)

154197.pdf - Accepted Version



To help scholars and businesses understand and analyse Twitter users, it is useful to have classifiers that can identify the communities that a given user belongs to, e.g. business or politics. Obtaining high quality training data is an important step towards producing an effective multi-community classifier. An efficient approach for creating such ground truth data is to extract users from existing public Twitter lists, where those lists represent different communities, e.g. a list of journalists. However, ground truth datasets obtained using such lists can be noisy, since not all users that belong to a community are good training examples for that community. In this paper, we conduct a thorough failure analysis of a ground truth dataset generated using Twitter lists. We discuss how some categories of users collected from these Twitter public lists could negatively affect the classification performance and therefore should not be used for training. Through experiments with 3 classifiers and 5 communities, we show that removing ambiguous users based on their tweets and profile can indeed result in a 10% increase in F1 performance.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Mccreadie, Dr Richard and Su, Ting and Macdonald, Professor Craig and Fang, Mr Anjie and Ounis, Professor Iadh
Authors: Su, T., Fang, A., McCreadie, R., Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Published Online:01 March 2018
Copyright Holders:Copyright © 2018 Springer International Publishing AG, part of Springer Nature
First Published:First published in Advances in Information Retrieval. ECIR 2018: 765-772
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record