The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection

Macdonald, C. and Ounis, I. (2006) The TREC Blogs06 Collection: Creating and Analysing a Blog Test Collection. Technical Report. Dept of Computing Science, University of Glasgow.

Full text not currently available from Enlighten.

Abstract

The explosion of blogs on the Web in recent years has fostered research interest in the Information Retrieval (IR) and other communities into the properties of the so-called `blogsphere'. However, without any standard test collection available, research has been restricted to unshared collections collected by individual research groups. With the advent of the Blog Track running at TREC 2006, there was a need to create a test collection of blog data, that could be shared among participants and form the backbone of the experiments. Such a collection should be a realistic snapshot of the blogsphere, of enough blogs as to have recognisable properties of the blogsphere, and over a long enough time period that events should be recognisable. In addition, the collection should exhibit other properties of the blogsphere, such as splogs and comment spam. This paper describes the creation of the Blogs06 collection by the University of Glasgow, and reports statistics of the collected data. Moreover, we demonstrate how some characteristics of the collection vary across the spam and non-spam components of the collection.

Item Type:Research Reports or Papers (Technical Report)
Additional Information:Technical Report No.: TR­-2006-­224
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Macdonald, Dr Craig and Ounis, Professor Iadh
Authors: Macdonald, C., and Ounis, I.
College/School:College of Science and Engineering > School of Computing Science
Publisher:Dept of Computing Science, University of Glasgow

University Staff: Request a correction | Enlighten Editors: Update this record