Transfer Learning for Multi-language Twitter Election Classification

Yang, X., McCreadie, R. , Macdonald, C. and Ounis, I. (2017) Transfer Learning for Multi-language Twitter Election Classification. In: The 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), Sydney, Australia, 31 Jul - 03 Aug 2017, pp. 341-348. ISBN 9781450349932 (doi: 10.1145/3110025.3110059)

[img]
Preview
Text
147409.pdf - Accepted Version

412kB

Abstract

Both politicians and citizens are increasingly embracing social media as a means to disseminate information and comment on various topics, particularly during significant political events, such as elections. Such commentary during elections is also of interest to social scientists and pollsters. To facilitate the study of social media during elections, there is a need to automatically identify posts that are topically related to those elections. However, current studies have focused on elections within English-speaking regions, and hence the resultant election content classifiers are only applicable for elections in countries where the predominant language is English. On the other hand, as social media is becoming more prevalent worldwide, there is an increasing need for election classifiers that can be generalised across different languages, without building a training dataset for each election. In this paper, based upon transfer learning, we study the development of effective and reusable election classifiers for use on social media across multiple languages. We combine transfer learning with different classifiers such as Support Vector Machines (SVM) and state-of-the-art Convolutional Neural Networks (CNN), which make use of word embedding representations for each social media post. We generalise the learned classifier models for cross-language classification by using a linear translation approach to map the word embedding vectors from one language into another. Experiments conducted over two election datasets in different languages show that without using any training data from the target language, linear translations outperform a classical transfer learning approach, namely Transfer Component Analysis (TCA), by 80% in recall and 25% in F1 measure.

Item Type:Conference Proceedings
Keywords:Word embedding, transfer learning, multi-language, Twitter, election classification, linear translation, convolutional neural network.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Mccreadie, Dr Richard and Macdonald, Professor Craig and Yang, Dr Xiao and Ounis, Professor Iadh
Authors: Yang, X., McCreadie, R., Macdonald, C., and Ounis, I.
Subjects:Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781450349932
Copyright Holders:Copyright © 2017 Association for Computing Machinery
First Published:First published in The 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM): 341-348
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
646621Explaining and Mitigating Electoral ViolenceSarah BirchEconomic and Social Research Council (ESRC)ES/L016435/1SPS - POLITICS