Reducing Annotation Effort for Quality Estimation via Active Learning
Daniel Beck, Lucia Specia and Trevor Cohn
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Quality estimation models provide feedback on the quality of automatically generated texts to end-users. Machine translation quality estimation models are usually trained on human-annotated datasets using a variety of quality labels. We investigate active learning techniques to reduce the size of these datasets and thus their annotation effort. Experiments on a number of datasets show that with as little as 25% of the training instances it is possible to obtain similar or superior performance compared to that of the complete datasets. In other words, our active learning query selection strategies allow not only to minimise annotation effort but also to filter datasets for better quality predictors.
Conference Manager (V2.61.0 - Rev. 2792M)