Kim, Y. and Ross, S. (2007) Variation of word frequencies across genre classification tasks. In: Thanos, C., Borri, F. and Launaro, A. (eds.) Second DELOS Conference on Digital Libraries: Pisa, Italy, 5-7 December 2007. Series: The DELOS network of excellence on digital libraries. GEIE-ERCIM: Sophia Antipolis, Nice, France. ISBN 9782912335364
Text
33647.pdf 425kB |
Abstract
This paper examines automated genre classification of text documents and its role in enabling the effective management of digital documents by digital libraries and other repositories. Genre classification, which narrows down the possible structure of a document, is a valuable step in realising the general automatic extraction of semantic metadata essential to the efficient management and use of digital objects. In the present report, we present an analysis of word frequencies in different genre classes in an effort to understand the distinction between independent classification tasks. In particular, we examine automated experiments on thirty-one genre classes to determine the relationship between the word frequency metrics and the degree of its significance in carrying out classification in varying environments.
Item Type: | Book Sections |
---|---|
Status: | Published |
Glasgow Author(s) Enlighten ID: | Kim, Dr Yunhyong and Ross, Professor Seamus |
Authors: | Kim, Y., and Ross, S. |
Subjects: | Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
College/School: | College of Arts & Humanities > School of Humanities > Information Studies |
Publisher: | GEIE-ERCIM |
ISBN: | 9782912335364 |
Copyright Holders: | Copyright © 2007 The Authors |
Publisher Policy: | Reproduced with the permission of the authors |
University Staff: Request a correction | Enlighten Editors: Update this record