Detecting family resemblance: automated genre classification

Kim, Y. and Ross, S. (2007) Detecting family resemblance: automated genre classification. In: 20th International CODATA Conference, Beijing, 22-25 October 2006, S172-S183. (doi: 10.2481/dsj.6.S172)



Publisher's URL:


This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising scientific data and in retrieving targetted material for improving research. The current paper compares the role of visual layout, stylistic features, and language model features in clustering documents and presents results in retrieving five selected genres (Scientific Article, Thesis, Periodicals, Business Report, and Form) from a pool of materials populated with documents of the nineteen most popular genres found in our experimental data set.

Item Type:Conference Proceedings
Keywords:Automated genre classification, Metadata, Scientific information, Information management, Information extraction
Glasgow Author(s) Enlighten ID:Kim, Dr Yunhyong and Ross, Professor Seamus
Authors: Kim, Y., and Ross, S.
Subjects:Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4450 Databases
Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
College/School:College of Arts > School of Humanities > Information Studies
Research Group:Digital Curation Centre
Publisher:International Council for Science
Copyright Holders:Copyright © 2007 International Council for Science
First Published:First published in CODATA Data Science Journal 6:S172-S183
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher
Related URLs:

University Staff: Request a correction | Enlighten Editors: Update this record