Semantic Parsing of Ambiguous Input through Paraphrasing and Verification

Philip Arthur(1351210)


The biggest goal of natural language understanding is letting the computer to understand human language and interpret it correctly. However, the ambiguities of the human language itself make this goal elusive. For example, there are 2 different interpretations for the famous example "I saw a girl with a telescope." This example is just the beginning for computers to undestand human language.

Nowadays, with the advancement of search engine such as Google and Yahoo a different form of language has been born. This language we use for search engines is usually called "search queries." The basic idea of search queries is that we don't usually enter a full sentence expressing the question we want to ask to the computer system. For example, it is more convenient to enter only "java hash table," compared to entering the full question "What is a Hash Table in the Java programming language?" This underspecified input creates more ambiguities and thus makes it harder for the computer to understand what human mean.

This thesis describes a novel method for understanding ambiguous and ungrammatical input, such as search queries. We do so by building on an existing framework for semantic parsing that uses synchronous context free grammars (SCFG) to jointly model the input sentence and output meaning. We generalize this existing SCFG framework to allow not one, but multiple outputs. Using this formalism, we construct a grammar that takes an ambiguous input string and jointly maps it into both a meaning representation and a natural language paraphrase that is less ambiguous than the original input. Further, this paraphrase can be used to disambiguate the meaning representation via verification using a language model that calculates the probability of each paraphrase. In addition, by having this framework output several choices of paraphrases of a particular search query, we can also let a human select the best paraphrase among them. Experiments show that this framework, with additional of human help, achieved better accuracy than a baseline on a task of parsing keyword queries.