Ensemble Convolutional Neural Network for Human Pose Estimation

川名雄樹 (1451039)


Human pose estimation is a highly challenging task due to the wide variety of poses. In order to meet complexity of human body configuration, as a divide and conquer approach, use of heterogenous expert models each of which is trained for limited variety of poses has been shown effective. However, effectively reflecting each expert model’s inference to unified inference still remains as an unaddressed issue. Since interdependency between the multiple inference is highly complexed, informative features and effective schemes for ensembling should be carefully considered. In order to address this issue, we propose a Convolutional Neural Network (ConvNet) architecture that enables to merge human pose inference from multiple heterogenous human pose estimation models each of which is specialized to infer certain configuration of poses. A ConvNet features a natural meachanism to learn interdependency among multiple input sources through a feature learning process preserving spatial contexts. We show that, on the FLIC dataset, this architecture outperforms the best state-of-the-art existing approach by 1.9% in PCP evaluation and achieves siginificat improvement in the high precision region by up to 8.7% in PDJ evaluation.