Japanese-Spanish Thesaurus Construction Using English as a Pivot

Jessica Claribel Ramirez Vidal (0551142)

In this thesis, we present the results of research with the goal of automatically creating a multilingual thesaurus based on the freely available resources of Wikipedia and Wordnet.

Our goal is to increase resources for natural language processing tasks such as machine translation targering the Japanese-Spanish language pair. Given the scarcity of resources, we use existing English resources as a pivot creating a trilingual Japanese-Spanish-English thesaurus. Our thesaurus is aligned at the word sense level and contains an ontology linked to and supplementing the relations in WordNet.

Construction of our thesis takes place is several automated steps: translations tuples are extracted from Wikipedia by making use of its hyperlinks to articles in different languages. The words senses of these tuples are aligned with those of WordNet using cosine vector similarity measures of text similarity between Wikipedia article texts and WordNet glosses and heuristics comparing the Wikipedia categories of a word with its hypernyms in WordNet.