Text segmentation via topic modeling: an analytical study

Misra, H., Yvon, F., Jose, J. and Cappe, O. (2009) Text segmentation via topic modeling: an analytical study. In: 18th ACM Conference on Information and Knowledge Management, Hong Kong, 2-6 Nov 2009, pp. 1553-1556. ISBN 9781605585123 (doi: 10.1145/1645953.1646170)

Full text not currently available from Enlighten.

Publisher's URL: http://dx.doi.org/10.1145/1645953.1646170

Abstract

In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of latent Dirichlet allocation (LDA) topic model to segment a text into semantically coherent segments. A major benefit of the proposed approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications like segment retrieval and discourse analysis. The new approach outperforms a standard baseline method and yields significantly better performance than most of the available unsupervised methods on a benchmark dataset.

Item Type:Conference Proceedings
Status:Published
Refereed:Yes
Glasgow Author(s) Enlighten ID:Jose, Professor Joemon and Misra, Dr Hemant
Authors: Misra, H., Yvon, F., Jose, J., and Cappe, O.
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
College/School:College of Science and Engineering > School of Computing Science
ISBN:9781605585123

University Staff: Request a correction | Enlighten Editors: Update this record