Fast and Adaptive Online Training of Feature-Rich Translation Models
Spence Green, Sida Wang, Daniel Cer and Christopher D. Manning
The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013
We present a fast and scalable online method for tuning statistical machine translation models with large feature sets. The standard tuning algorithm---MERT---only scales to tens of features. Recent discriminative algorithms that accommodate sparse features have produced smaller than expected translation duality gains in large systems. Our method, which is based on stochastic gradient descent with an adaptive learning rate, scales to millions of features and tuning sets with tens of thousands of sentences, while still converging after only a few epochs. Large-scale experiments on Arabic-English and Chinese-English show that our method produces significant translation quality gains by exploiting sparse features. Equally important is our analysis, which suggests techniques for mitigating overfitting and domain mismatch, and applies to other recent discriminative methods for machine translation.
Conference Manager (V2.61.0 - Rev. 2792M)