Multilingualism in Greater Poland court records (1386-1448): tagging discourse boundaries and code-switching

Włodarczyk, M., Kopaczyk, J. and Kozak, M. (2020) Multilingualism in Greater Poland court records (1386-1448): tagging discourse boundaries and code-switching. Corpora, 15(3), pp. 273-290. (doi: 10.3366/cor.2020.0200)

[img] Text
218506.pdf - Accepted Version



This paper introduces the Electronic Repository of Greater Poland Oaths, eROThA (1386–1446), a digitisation project of a diplomatic edition of mediaeval land court oaths recorded in Latin and Old Polish, resulting in a small, lightly tagged specialised bilingual corpus. We present the background, aims, design and methodology of the project. We also discuss the problems and limitations entrenched in turning a printed diplomatic edition into a machine-readable diplomatic edition equipped with a new interpretative layer that is sensitive to the switches between Latin and Old Polish. In addition to the automatic annotation of code-switched items on the basis of typographic characteristics of the printed edition, flexible coding of recurrent language and discourse boundary phenomena has been introduced manually to account for linguistically ambiguous or neutral forms. The project offers a fully multilingual corpus, as well as customised Polish-only and Latin-only datasets, and enables filtered metadata searches in the online front-end. Overall, the report presents a methodology for constructing multilingual corpora in the context of legal cultures in medieval Central Europe that may be extrapolated to datasets originating in other periods and regions.

Item Type:Articles
Additional Information:The paper and the eROThA database were produced as a result of a research project funded by the National Science Centre in Poland (OPUS No. 2014/13/B/HS2/00644 The repository is found at:
Glasgow Author(s) Enlighten ID:Kopaczyk, Professor Joanna
Authors: Włodarczyk, M., Kopaczyk, J., and Kozak, M.
College/School:College of Arts & Humanities > School of Critical Studies > English Language and Linguistics
Journal Name:Corpora
Publisher:Edinburgh University Press
ISSN (Online):1755-1676
Copyright Holders:Copyright © 2020 Edinburgh University Press
First Published:First published in Corpora 15(3):273-290
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record