Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013
Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.
Conference Manager (V2.61.0 - Rev. 2792M)