Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
Xiaoming Lu, Lei Xie, Cheung-Chi Leung, Bin Ma and Haizhou Li
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
We present an efficient approach for broadcast news story segmentation using a manifold learning algorithm on latent topic distributions. The latent topic distribution estimated by Latent Dirichlet Allocation (LDA) is used to represent each text block. We employ Laplacian Eigenmaps (LE) to project the latent topic distributions into low-dimensional semantic representations while preserving the intrinsic local geometric structure. Finally dynamic programming is applied for story boundary detection. We evaluate two approaches employing LDA and Probabilistic Latent Semantic Analysis (PLSA) distributions respectively. The effects of the two approaches with different amount of training data are studied. Experimental results show that the F1-measure of our proposed LDA-based approach outperforms the corresponding PLSA-based approach when there is a sufficient amount of training data. Our approach provides the best performance with the highest F1-measure of 0.7860.
Conference Manager (V2.61.0 - Rev. 2792M)