Reducing Hubs for Improving Nearest Neighbor Method

Yutaro Shigeto (1461004)


Zero-shot learning is an active research topic in machine learning. The goal of zero-shot learning is to learn a classifier, which predicts unseen label of test object, from a training dataset which does not include objects related to unseen label.

To predict unseen label, zero-shot learning assumes that labels are embedded in a vector space. Under such assumption, zero-shot learning can be formulated as a least squares problem, which minimizes the distance between each object and its label in the training set. Once we obtain such a common vector representation, we just conduct nearest neighbor search in the space: i.e., fining the label closest to the test object.

Hubness phenomenon states that a small number of objects, called \emph{hubs}, in the dataset, may occur as the nearest neighbor of many objects. The presence of these hubs will diminish the utility of nearest-neighbor methods, because the lists of nearest neighbors frequently contain the same hub objects regardless of the query. Because nearest neighbor search is one of process of zero-shot learning, its result is adversely influenced by hubness phenomenon.

In this presentation, we tackle the hubness phenomenon to improve the performance of zero-shot learning. Concretely, we first investigate the influence of hubs in a practical application of zero-shot learning, i.e., bilingual lexicon extraction. We also extend the existing hubness reduction methods to bilingual lexicon extraction. Next, we show why hubs emerge in zero-shot learning, and then present the degree of bias in the dataset. This degree suggests that to reduce hubness we have to decrease the variance of the data. Following this suggestion, we propose an approach that can decrease the variance of the data, and it, indeed, reduce the emergence of hubs. We also empirically show that the proposed approach outperforms the baseline methods. Moreover the proposed approach can apply to ordinary classification setting. Our experiments show that the approach improves the classification accuracies, and its training time is faster than the existing distance metric learning methods.