START Conference Manager    

Reducing Annotation Effort for Quality Estimation via Active Learning

Daniel Beck, Lucia Specia and Trevor Cohn

The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013


Abstract

Quality estimation models provide feedback on the quality of automatically generated texts to end-users. Machine translation quality estimation models are usually trained on human-annotated datasets using a variety of quality labels. We investigate active learning techniques to reduce the size of these datasets and thus their annotation effort. Experiments on a number of datasets show that with as little as 25% of the training instances it is possible to obtain similar or superior performance compared to that of the complete datasets. In other words, our active learning query selection strategies allow not only to minimise annotation effort but also to filter datasets for better quality predictors.


START Conference Manager (V2.61.0 - Rev. 2792M)