Searching for ground truth: a stepping stone in automating genre classification

Kim, Y. and Ross, S. (2007) Searching for ground truth: a stepping stone in automating genre classification. In: DELOS Conference on Digital Libraries, Tirrenia, Pisa, Italy, 13-14 Febuary 2007, pp. 248-261. ISBN 9783540770879 (doi:10.1007/978-3-540-77088-6_24)

[img]
Preview
Text
4739.pdf

389kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-540-77088-6_24

Abstract

This paper examines genre classification of documents and its role in enabling the effective automated management of digital documents by digital libraries and other repositories. We have previously presented genre classification as a valuable step toward achieving automated extraction of descriptive metadata for digital material. Here, we present results from experiments using human labellers, conducted to assist in genre characterisation and the prediction of obstacles which need to be overcome by an automated system, and to contribute to the process of creating a solid testbed corpus for extending automated genre classification and testing metadata extraction tools across genres. We also describe the performance of two classifiers based on image and stylistic modeling features in labelling the data resulting from the agreement of three human labellers across fifteen genre classes.

Item Type:Conference Proceedings
Keywords:information extraction, genre classification,automated metadata extraction, metadata, digital library, data management.
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Kim, Dr Yunhyong and Ross, Professor Seamus
Authors: Kim, Y., and Ross, S.
Subjects:Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
College/School:College of Arts > School of Humanities > Information Studies
Research Group:Digital Curation Center
Publisher:Springer
ISSN:1611-3349
ISBN:9783540770879
Copyright Holders:Copyright © 2007 Springer
First Published:First published in Lecture Notes in Computer Science 4877:248-261
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher

University Staff: Request a correction | Enlighten Editors: Update this record