START Conference Manager    

Linguistic Models for Analyzing and Detecting Biased Language

Marta Recasens, Cristian Danescu-Niculescu-Mizil and Dan Jurafsky

The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013


Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: "framing bias", such as praising or perspective-specific words, which we link to the literature on subjectivity; and "epistemological bias", related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective intensifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.

START Conference Manager (V2.61.0 - Rev. 2792M)