Key blog distillation

Macdonald, C. and Ounis, I. (2008) Key blog distillation. In: 17th ACM Conference on Information and Knowledge Management, Napa Valley, California, USA, 26-30 Oct 2008, pp. 1043-1052. (doi: 10.1145/1458082.1458221)

Full text not currently available from Enlighten.

Publisher's URL:


Searchers on the blogosphere often have a need to identify other key bloggers with similar interests to their own. However, a main difference of this blog distillation task from normal adhoc or Web document retrieval is that each blog can be seen as an aggregate of its constituent posts. On the other hand, we show that the task is similar to the expert search task, where a person's expertise is derived from the aggregate of their publications or emails. In this paper, we investigate several aspects of blog retrieval: Firstly, we experiment whether a blog should be represented as a whole unit, or as by considering each of its posts as indicators of its relevance, showing that expert search techniques can be adapted for blog search; Secondly, we examine whether indexing only the XML feed provided by each blog (and which is often incomplete) is sufficient, or whether the full-text of each blog post should be downloaded; Lastly, we use approaches to detect the central or recurring interests of each blog to increase the retrieval effectiveness of the system. Using the TREC 2007 Blog dataset, the results show that our proposed expert search paradigm is indeed useful in identifying key bloggers, achieving high retrieval effectiveness.

Item Type:Conference Proceedings
Glasgow Author(s) Enlighten ID:Macdonald, Professor Craig and Ounis, Professor Iadh
Authors: Macdonald, C., and Ounis, I.
Subjects:Z Bibliography. Library Science. Information Resources > Z665 Library Science. Information Science
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science

University Staff: Request a correction | Enlighten Editors: Update this record