Building Comparable Corpora based on Bilingual LDA Model
Zede Zhu, Miao Li, Lei Chen and Zhenxin Yang
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Comparable corpora are important basic re-sources in cross-language information pro-cessing. However, the existing methods of building comparable corpora, which use inter-translate words and relative features, cannot evaluate the topical relation between document pairs. This paper uses the bilingual LDA model to predict the topical structures of the documents and proposes three algorithms of document similarity in different languages. Experiments show that the new method can obtain similar documents with consistent topics with better performance of adaptability and stability.
Conference Manager (V2.61.0 - Rev. 2792M)