Effective techniques for automatic extraction of Web publications

Fong, A.C.M., Hui, S.C. and Vu, H.L. (2002) Effective techniques for automatic extraction of Web publications. Online Information Review, 26(1), pp. 4-18. (doi: 10.1108/14684520210418347)

Full text not currently available from Enlighten.

Abstract

Research organisations and individual researchers increasingly choose to share their research findings by providing lists of their published works on the World Wide Web. To facilitate the exchange of ideas, the lists often include links to published papers in portable document format (PDF) or Postscript (PS) format. Generally, these publication Web sites are updated regularly to include new works. While manual monitoring of relevant Web sites is tedious, commercial search engines and information monitoring systems are ineffective in finding and tracking scholarly publications. Analyses the characteristics of publication index pages and describes effective automatic extraction techniques that the authors have developed. The authors’ techniques combine lexical and syntactic analyses with heuristics. The proposed techniques have been implemented and tested for more than 14,000 Web pages and achieved consistently high success rates of around 90 percent.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Fong, Dr Alvis Cheuk Min
Authors: Fong, A.C.M., Hui, S.C., and Vu, H.L.
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Online Information Review
Publisher:Emerald
ISSN:1468-4527
ISSN (Online):1468-4535

University Staff: Request a correction | Enlighten Editors: Update this record