Ensemble Reranking with Linguistic and Semantic Features for Arabic Character Recognition
Nadi Tomeh, Nizar Habash, Ryan Roth, Noura Farra, Pradeep Dasigi and Mona Diab
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Extant optical character recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency. In this paper we incorporate linguistically and semantically motivated features to an existing OCR system. To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques.
We achieve 10.1% and 11.4% reduction in recognition word error rate (WER) relative to the baseline system on typewritten and handwritten Arabic respectively.
Conference Manager (V2.61.0 - Rev. 2792M)