Bootstrapping algorithms, however, suffer from a problem known as semantic drift: as the iteration proceeds, the algorithms tend to select instances unrelated to the seed instances. Previous work has tried to reduce semantic drift by providing the algorithm with the stop list of instances likely to trigger drift and not using them for subsequent training. This list is usually created by human experts. Drifts can also be reduced by carefully selecting seeds, but selecting good seeds again requires expert knowledge.
In this thesis, we present unsupervised graph-based methods to alleviate semantic drift in the state-of-the-art bootstrapping algorithm Espresso. Our idea is built around the concept of hubs, in the context of Kleinberg's HITS algorithm for evaluating the importance of web pages. We model the instance extraction process of the Espresso algorithm as a graph, and select seeds and create a stop list, based on the HITS ranking of instances in the graph. We demonstrate the efficacy of our approach on WSD tasks. Experimental results show that the proposed methods find seed sets better than random seed sets, and the stop list created by our method reduces semantic drift and improve the accuracy of harvested instances.