Extending weighting models with a term quality measure

Lioma, C. and Ounis, I. (2007) Extending weighting models with a term quality measure. Lecture Notes in Computer Science, 4726, pp. 205-216. (doi:10.1007/978-3-540-75530-2_19)

[img]
Preview
Text
lioma3759.pdf

487kB

Publisher's URL: http://dx.doi.org/10.1007/978-3-540-75530-2_19

Abstract

Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from the removal of stopwords, there is no other consideration of the quality of words that are being ‘weighted’. It is often assumed that term frequency is a good indicator for a decision to be made as to how relevant a document is to a query. Our intuition is that raw term frequency could be enhanced to better discriminate between terms. To do so, we propose using non-lexical features to predict the ‘quality’ of words, before they are weighted for retrieval. Specifically, we show how parts of speech (e.g. nouns, verbs) can help estimate how informative a word generally is, regardless of its relevance to a query/document. Experimental results with two standard TREC collections show that integrating the proposed term quality to two established weighting models enhances retrieval performance, over a baseline that uses the original weighting models, at all times.

Item Type:Articles
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Ounis, Professor Iadh
Authors: Lioma, C., and Ounis, I.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
Journal Name:Lecture Notes in Computer Science
Publisher:Springer
ISSN:1611-3349
Copyright Holders:Copyright © 2007 Springer
First Published:First published in Lecture Notes in Computer Science 4726:205-216
Publisher Policy:Reproduced in accordance with the copyright policy of the publisher.

University Staff: Request a correction | Enlighten Editors: Update this record