Our goal is to increase resources for natural language processing tasks such as machine translation targering the Japanese-Spanish language pair. Given the scarcity of resources, we use existing English resources as a pivot creating a trilingual Japanese-Spanish-English thesaurus. Our thesaurus is aligned at the word sense level and contains an ontology linked to and supplementing the relations in WordNet.
Construction of our thesis takes place is several automated steps: translations tuples are extracted from Wikipedia by making use of its hyperlinks to articles in different languages. The words senses of these tuples are aligned with those of WordNet using cosine vector similarity measures of text similarity between Wikipedia article texts and WordNet glosses and heuristics comparing the Wikipedia categories of a word with its hypernyms in WordNet.