Effectiveness beyond the first crawl tier

Santos, R.L.T., Macdonald, C. and Ounis, I. (2011) Effectiveness beyond the first crawl tier. In: 20th ACM Conference on Information and Knowledge Management (CIKM 2011), Glasgow, UK, 24-28 October 2011, (doi: 10.1145/2063576.2063859)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1145/2063576.2063859

Abstract

Modern Web crawlers seek to visit quality documents first, and re-visit them more frequently than other documents. As a result, the first-tier crawl of a Web corpus is typically of higher quality compared to subsequent crawls. In this paper, we investigate the impact of first-tier documents on adhoc retrieval performance. In particular, we analyse the retrieval performance of runs submitted to the adhoc task of the TREC 2009 Web track in terms of how they rank first-tier documents and how these documents contribute to the performance of each run. Our results show that the performance of these runs is heavily dependent on their ability to rank first-tier documents. Moreover, we show that, different from leading Web search engines, their attempt to go beyond the first tier almost always results in decreased performance. Finally, we show that selectively removing spam from different tiers can be a direction for fully exploiting documents beyond the first tier.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Ounis, Professor Iadh
Authors: Santos, R.L.T., Macdonald, C., and Ounis, I.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record