Measuring semantic content in distributional vectors
Aurelie Herbelot and Mohan Ganesalingam
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Some words are more contentful than others: for instance, 'make' is intuitively more general than 'produce' and 'two' is more precise than 'a crowd'. In this paper, we propose to measure the semantic content of lexical items, as modelled by distributional representations. We investigate the hypothesis that semantic content can be computed using the Kullback-Leibler (KL) divergence, an information-theoretic measure of the relative entropy of two distributions. In a task focusing on retrieving the correct ordering of hyponym-hypernym pairs, the KL divergence achieves close to 80% precision but does not outperform a simpler (linguistically unmotivated) frequency measure. We suggest that this result illustrates the rather 'intensional' aspect of distributions.
Conference Manager (V2.61.0 - Rev. 2792M)