Studies on Efficient Parsing and Logic-based Inference based on Combinatory Categorial Grammar

吉川将司


Combinatory Categorial Grammar (CCG) is a grammatical formalism, which has been used in theoretical linguistics to provide explanations to various linguistic phenomena, while it has also attracted attentions in Natural Language Processing to develop systems to solve natural language inference tasks utilizing the linguistic insights.

The first contribution of this presentation is the development of an accurate and efficient CCG parser. Our method exploits characteristics of the grammar, and models the probability of a tree using a strongly factored model, which allows the computation of the most probable CCG tree using very efficient A* parsing algorithm. We conduct experiments on both Japanese and English CCG treebanks, observe that our parser leads to improved performance in both settings.

Second, we work on the domain adaptation issue of CCG parsing, since we are interested in applications of CCG-based inference systems in various domains such as scientific papers and speech conversation. We propose a new domain adaptation method, based on the idea of automatic generation of CCG corpora using cheaper dependency-based resources. We conduct parsing experiments on four different domains: biomedical texts, question sentences, speech conversation, and math problems, in the latter two of which we observe huge performance gains.

Our last contribution is on logic-based reasoning systems. We tackle the issue of adding external knowledge to these systems; there is a tension in extending knowledge base in these systems and their efficiency in solving a problem. We show that the processing speed of a state-of-the-art logic-based RTE system can be significantly improved by using techniques of Knowledge Base Completion. Additionally, we integrate this mechanism in a Coq plugin that provides a proof automation tactic for natural language inference.