Lioma, C. and Ounis, I. (2007) Extending weighting models with a term quality measure. Lecture Notes in Computer Science, 4726, pp. 205-216. (doi: 10.1007/978-3-540-75530-2_19)
|
Text
lioma3759.pdf 487kB |
Publisher's URL: http://dx.doi.org/10.1007/978-3-540-75530-2_19
Abstract
Weighting models use lexical statistics, such as term frequencies, to derive term weights, which are used to estimate the relevance of a document to a query. Apart from the removal of stopwords, there is no other consideration of the quality of words that are being ‘weighted’. It is often assumed that term frequency is a good indicator for a decision to be made as to how relevant a document is to a query. Our intuition is that raw term frequency could be enhanced to better discriminate between terms. To do so, we propose using non-lexical features to predict the ‘quality’ of words, before they are weighted for retrieval. Specifically, we show how parts of speech (e.g. nouns, verbs) can help estimate how informative a word generally is, regardless of its relevance to a query/document. Experimental results with two standard TREC collections show that integrating the proposed term quality to two established weighting models enhances retrieval performance, over a baseline that uses the original weighting models, at all times.
Item Type: | Articles |
---|---|
Status: | Published |
Refereed: | Yes |
Glasgow Author(s) Enlighten ID: | Ounis, Professor Iadh |
Authors: | Lioma, C., and Ounis, I. |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
College/School: | College of Science and Engineering > School of Computing Science |
Journal Name: | Lecture Notes in Computer Science |
Publisher: | Springer |
ISSN: | 1611-3349 |
Copyright Holders: | Copyright © 2007 Springer |
First Published: | First published in Lecture Notes in Computer Science 4726:205-216 |
Publisher Policy: | Reproduced in accordance with the copyright policy of the publisher. |
University Staff: Request a correction | Enlighten Editors: Update this record