Enlighten Publications

In this section

Word Embeddings are biased. But whose bias are they reflecting?

Petreski, D. and Hashim, I. C. (2023) Word Embeddings are biased. But whose bias are they reflecting? AI and Society, 38(2), pp. 75-982. (doi: 10.1007/s00146-022-01443-w)

Text
269188.pdf - Published Version
Available under License Creative Commons Attribution.
619kB

Abstract

From Curriculum Vitae parsing to web search and recommendation systems, Word2Vec and other word embedding techniques have an increasing presence in everyday interactions in human society. Biases, such as gender bias, have been thoroughly researched and evidenced to be present in word embeddings. Most of the research focuses on discovering and mitigating gender bias within the frames of the vector space itself. Nevertheless, whose bias is reflected in word embeddings has not yet been investigated. Besides discovering and mitigating gender bias, it is also important to examine whether a feminine or a masculine-centric view is represented in the biases of word embeddings. This way, we will not only gain more insight into the origins of the before mentioned biases, but also present a novel approach to investigating biases in Natural Language Processing systems. Based on previous research in the social sciences and gender studies, we hypothesize that masculine-centric, otherwise known as androcentric, biases are dominant in word embeddings. To test this hypothesis we used the largest English word association test data set publicly available. We compare the distance of the responses of male and female participants to cue words in a word embedding vector space. We found that the word embedding is biased towards a masculine-centric viewpoint, predominantly reflecting the worldviews of the male participants in the word association test data set. Therefore, by conducting this research, we aimed to unravel another layer of bias to be considered when examining fairness in algorithms.

Item Type:	Articles
Status:	Published
Refereed:	Yes
Glasgow Author(s) Enlighten ID:	Hashim, Mr Ibrahim
Authors:	Petreski, D., and Hashim, I. C.
College/School:	College of Medical Veterinary and Life Sciences > School of Psychology & Neuroscience College of Social Sciences > School of Education
Journal Name:	AI and Society
Publisher:	Springer
ISSN:	0951-5666
ISSN (Online):	1435-5655
Published Online:	26 May 2022
Copyright Holders:	Copyright © 2022 The Authors
First Published:	First published in AI and Society 38(2): 975-982
Publisher Policy:	Reproduced under a Creative Commons License

University Staff: Request a correction | Enlighten Editors: Update this record

Deposit and Record Details

ID Code:	269188
Depositing User:	Miss Valerie McCutcheon
Datestamp:	30 May 2022 11:25
Last Modified:	01 Jun 2023 15:16
Date of acceptance:	24 March 2022
Date of first online publication:	26 May 2022
Date Deposited:	14 April 2022
Data Availability Statement:	No