Kim, Y. and Ross, S. (2007) Detecting family resemblance: automated genre classification. In: 20th International CODATA Conference, Beijing, 22-25 October 2006, S172-S183. (doi: 10.2481/dsj.6.S172)
|
Text
Ross.pdf 637kB |
Publisher's URL: http://dx.doi.org/10.2481/dsj.6.S172
Abstract
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.
Item Type: | Conference Proceedings |
---|---|
Keywords: | Automated genre classification, Metadata, Scientific information, Information management, Information extraction |
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Kim, Dr Yunhyong and Ross, Professor Seamus |
Authors: | Kim, Y., and Ross, S. |
Subjects: | Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4450 Databases Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources |
College/School: | College of Arts & Humanities > School of Humanities > Information Studies |
Research Group: | Digital Curation Centre |
Publisher: | International Council for Science |
ISSN: | 1683-1470 |
Copyright Holders: | Copyright © 2007 International Council for Science |
First Published: | First published in CODATA Data Science Journal 6:S172-S183 |
Publisher Policy: | Reproduced in accordance with the copyright policy of the publisher |
Related URLs: |
University Staff: Request a correction | Enlighten Editors: Update this record