The Video Game Dialogue Corpus

Rennick, S. and Roberts, S. G. (2023) The Video Game Dialogue Corpus. Corpora, (Accepted for Publication)

[img] Text
291596.pdf - Accepted Version

775kB

Abstract

This paper presents the Video Game Dialogue Corpus, the first large-scale, consistently coded, open source corpus of dialogue from video games. It contains over 6.2 million words of English dialogue from 50 games in the Role Playing Game (RPG) genre. This includes: games produced between 1985 and 2020; rated for children, teenagers, and adults; and in both “Western” and “Japanese” subgenres. The corpus design is described, including custom data formats for representing branching dialogue. We demonstrate the use of the corpus by comparing the dialogue of female and male characters, where we find reflections of gendered language in other media as well as patterns that seem specific to video games. We provide the source code for a “self-inflating corpus”: a pipeline that obtains the data then processes and parses it into a standard format. This makes the corpus available for teaching and research purposes, providing the first such resource for empirical analysis of video game dialogue.

Item Type:Articles
Additional Information:S.R. was supported by a Swiss National Science Foundation grant (182847).
Status:Accepted for Publication
Refereed:Yes
Glasgow Author(s) Enlighten ID:Rennick, Dr Steph
Authors: Rennick, S., and Roberts, S. G.
College/School:College of Arts > School of Humanities > Philosophy
Journal Name:Corpora
Publisher:Edinburgh University Press
ISSN:1749-5032
ISSN (Online):1755-1676
Copyright Holders:Copyright © 2023 Edinburgh University Press
First Published:First published in Corpora 2023
Publisher Policy:Reproduced in accordance with the publisher copyright policy

University Staff: Request a correction | Enlighten Editors: Update this record