ACL 2013 - The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

Kashyap Popat, Balamurali A.R, Pushpak Bhattacharyya and Gholamreza Haffari

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013

Abstract

Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering. Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level. Similar idea is applied to Cross Lingual Sentiment Analysis (CLSA), and it is shown that reduction in data sparsity (after translation or bilingual-mapping) produces accuracy higher than Machine Translation based CLSA and sense based CLSA.

START Conference Manager (V2.61.0 - Rev. 2792M)