Understanding Tables in Context using Standard NLP Toolkits
Vidhya Govindaraju, Ce Zhang and Christopher Ré
The 51st Annual Meeting of the Association for Computational Linguistics - Short Papers (ACL Short Papers 2013)
Sofia, Bulgaria, August 4-9, 2013
Tabular information in text documents contains a wealth of information, and so tables are a natural candidate for information extraction. There are many cues buried in both a table and its surrounding text that allow us to understand the meaning of the data in a table. We study how natural-language tools, such as part-of-speech tagging, dependency paths, and named-entity recognition, can be used to improve the quality of data extraction from tables.
In three domains we show that (1) a model that performs joint probabilistic inference across tabular and natural language features achieves twice as high F1 score than either a pure-table or pure-text system and (2) using only shallower features or non-joint inference results in lower quality.
Conference Manager (V2.61.0 - Rev. 2792M)