Enlighten Publications

In this section

KRYS I Corpus

Kim, Y. , Ross, S. and Berninger, V. (2008) KRYS I Corpus. [Website]

Full text not currently available from Enlighten.

Abstract

The KRYS I corpus is a collection of over 6300 documents labelled with their genre classes. It was constructed as part of a research initiative to automate document genre classification driven by the Digital Curation Centre. It was carried out at the Humanities Advanced Technology and Information Institute (HATII), University of Glasgow between 2005 and 2008. The notion of genre is deeply embedded in the way humans organise information. Identifying the genre of a document helps to characterise the physical and conceptual structure of the text, helping to capture the style and location of further information within the text. There have been very few genre-labelled corpora available to the research community. Our corpus is made available here to fill this gap and serve as a valuable resource for researchers in: metadata extraction, digital curation, text classification, text mining, computational linguistics, and, pattern recognition.

Item Type:	Website
Status:	Published
Glasgow Author(s) Enlighten ID:	Kim, Dr Yunhyong and Ross, Professor Seamus
Authors:	Kim, Y., Ross, S., and Berninger, V.
Subjects:	Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
College/School:	College of Arts & Humanities > School of Humanities > Information Studies

University Staff: Request a correction | Enlighten Editors: Update this record

Funder and Project Information

Project Code	Award No	Project Name	Principal Investigator	Funder's Name	Funder Ref	Lead Dept
37417	2	National Digital Curation Centre (NDCC)	Seamus Ross	Engineering and Physical Sciences Research Council (EPSRC)	GR/T07374/01 H35404	HU - INFORMATION STUDIES

Deposit and Record Details

ID Code:	153591
Depositing User:	Dr Yunhyong Kim
Datestamp:	15 Dec 2017 10:05
Last Modified:	14 Jun 2022 13:06
Date of first online publication:	1 November 2008