KRYS I Corpus

Kim, Y. , Ross, S. and Berninger, V. (2008) KRYS I Corpus. [Website]

Full text not currently available from Enlighten.


The KRYS I corpus is a collection of over 6300 documents labelled with their genre classes. It was constructed as part of a research initiative to automate document genre classification driven by the Digital Curation Centre. It was carried out at the Humanities Advanced Technology and Information Institute (HATII), University of Glasgow between 2005 and 2008. The notion of genre is deeply embedded in the way humans organise information. Identifying the genre of a document helps to characterise the physical and conceptual structure of the text, helping to capture the style and location of further information within the text. There have been very few genre-labelled corpora available to the research community. Our corpus is made available here to fill this gap and serve as a valuable resource for researchers in: metadata extraction, digital curation, text classification, text mining, computational linguistics, and, pattern recognition.

Item Type:Website
Glasgow Author(s) Enlighten ID:Kim, Dr Yunhyong and Ross, Professor Seamus
Authors: Kim, Y., Ross, S., and Berninger, V.
Subjects:Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
College/School:College of Arts > School of Humanities > Information Studies

University Staff: Request a correction | Enlighten Editors: Update this record

Project CodeAward NoProject NamePrincipal InvestigatorFunder's NameFunder RefLead Dept
374172National Digital Curation Centre (NDCC)Seamus RossEngineering and Physical Sciences Research Council (EPSRC)GR/T07374/01 H35404HU - INFORMATION STUDIES