Roles of Pre-training in Deep Neural Networks from Information Theoretical Perspective

Yasutaka Furusho (1551096)


Although deep neural network shows high performance in pattern recognition and machine learning, the reasons remain unclarified. To tackle this problem, we calculated information theoretical measures of the representations in the hidden layers and analyzed their relationship to the performance. We found that the information theoretical measures are related to generalization error. This implies that the information theoretical measures may be an alternative criterion for accuracy-evaluation based criteria to determine the number of layers in deep neural network since they do not need fine-tuning that requires high computational loads. We also found that conditional entropy of representations given classes decreased faster than the mutual information between representations and classes did as the number of hidden layers increased. This result means that deep neural network got rid of information which is not related to the classes from information in the representations while tried to retain the classes information as the number of hidden layers increased. This tendency was stronger in deep neural network trained with pre-training than deep neural network trained without pre-training. And, we found that pre-training algoirhtm which increased the mutual information led fine-tuning to achieve the high mutual information and the mutual information was negative correlated to the prediction error. Therefore, the pre-training algoirhtm which increased the mutual information is good pre-training algorithm unless pre-training algorithm can use class information. Autoencoder follows this strategy.