Relation Extraction: Perspective from Weakly Supervised Methods

PHI VAN THUY


Relation extraction is the task of recognizing and extracting semantic relations over entities expressed in text. Existing supervised systems for relation extraction require a large amount of labeled relation-specific data. However, in practice, most relation extraction tasks do not have any supervised training data available.

In this study, we focus on two main weakly supervised approaches, namely bootstrapping and distantly supervised relation extraction methods, which reduce the cost of obtaining labeled examples in supervised learning. The first part of the study addresses the subtasks of automatic seed selection for bootstrapping relation extraction, and noise reduction for distantly supervised relation extraction. Ours is the first work that formulates them as ranking problems, and propose methods that can be applied for both subtasks. Experiments show that our proposed methods achieve a better performance than the baseline systems in these subtasks.

The second part of the dissertation investigates distant supervision, a weakly supervised algorithm that automatically generates training examples by aligning free text with a knowledge base. We propose a novel neural model that combines a bidirectional gated recurrent unit model with a form of hierarchical attention that is better suited to relation extraction. We demonstrate that an additional attention mechanism called piecewise attention, which builds itself upon segment level representations, significantly enhances the performance of the distantly supervised relation extraction task. In addition, we propose a contextual inference method that can infer the most likely positive examples in bags with very limited contextual information. The experimental results show that our proposed methods outperform state-of-the-art baselines on benchmark datasets.