Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information
Tsutomu Hirao, Tomoharu Iwata and Masaaki Nagata
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Unsupervised object matching (UOM) is a promising approach to cross-language natural language processing such as bilingual lexicon acquisition, parallel corpus construction, and cross-language text categorization, because it does not require labor-intensive linguistic resources. However, UOM only finds one-to-one correspondences from data sets with the same number of instances in source and target domains, and this prevents us from applying UOM to real-world cross-language natural language processing tasks. To alleviate these limitations, we proposes latent semantic matching, which embeds objects in both source and target language domains into a shared latent topic space. We demonstrate the effectiveness of our method on cross-language text categorization. The results show that our method outperforms conventional unsupervised object matching methods.
Conference Manager (V2.61.0 - Rev. 2792M)