Włodarczyk, M., Kopaczyk, J. and Kozak, M. (2020) Multilingualism in Greater Poland court records (1386-1448): tagging discourse boundaries and code-switching. Corpora, 15(3), pp. 273-290. (doi: 10.3366/cor.2020.0200)
Text
218506.pdf - Accepted Version 950kB |
Abstract
This paper introduces the Electronic Repository of Greater Poland Oaths, eROThA (1386–1446), a digitisation project of a diplomatic edition of mediaeval land court oaths recorded in Latin and Old Polish, resulting in a small, lightly tagged specialised bilingual corpus. We present the background, aims, design and methodology of the project. We also discuss the problems and limitations entrenched in turning a printed diplomatic edition into a machine-readable diplomatic edition equipped with a new interpretative layer that is sensitive to the switches between Latin and Old Polish. In addition to the automatic annotation of code-switched items on the basis of typographic characteristics of the printed edition, flexible coding of recurrent language and discourse boundary phenomena has been introduced manually to account for linguistically ambiguous or neutral forms. The project offers a fully multilingual corpus, as well as customised Polish-only and Latin-only datasets, and enables filtered metadata searches in the online front-end. Overall, the report presents a methodology for constructing multilingual corpora in the context of legal cultures in medieval Central Europe that may be extrapolated to datasets originating in other periods and regions.
Item Type: | Articles |
---|---|
Additional Information: | The paper and the eROThA database were produced as a result of a research project funded by the National Science Centre in Poland (OPUS No. 2014/13/B/HS2/00644 https://projekty.ncn.gov.pl/index.php?s=11260). The repository is found at: https://rotha.ehum.psnc.pl/. |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Kopaczyk, Professor Joanna |
Authors: | Włodarczyk, M., Kopaczyk, J., and Kozak, M. |
College/School: | College of Arts & Humanities > School of Critical Studies > English Language and Linguistics |
Journal Name: | Corpora |
Publisher: | Edinburgh University Press |
ISSN: | 1749-5032 |
ISSN (Online): | 1755-1676 |
Copyright Holders: | Copyright © 2020 Edinburgh University Press |
First Published: | First published in Corpora 15(3):273-290 |
Publisher Policy: | Reproduced in accordance with the copyright policy of the publisher |
University Staff: Request a correction | Enlighten Editors: Update this record