Simultaneous Estimation of Hand Pose Parameters Using Unscented Kalman Filter

Albert Causo (0451205)

Estimating the complete hand pose is an important step towards a more natural Human-Computer Interaction (HCI) systems. Applications such as computer-aided modeling and design, advanced virtual reality systems or even human-robot interaction will be much better if the hand can be utilized as an accurate input device. Vision-based model-based approach which uses features from 2D input images from camera (vision), to derive 3D parametric model of the hands is a way realize such possibility. 3D model of the hands can give the most accurate details about hand pose but current researches on this topic have varied results because of the complexity and high-dimensionality of modeling the hand. Some can only track global or only local pose, or a little of both, while some can track both but the hand must be limited to a particular movement only. Occlusion is also a very common problem. This thesis proposes a new approach for model-based hand-pose estimation, which aims to track all hand-pose parameters simultaneously. A system previously developed by Ueda \emph{et al} estimates hand-pose using silhouette images taken from multiple cameras. The silhouette images are converted to voxel model, which is used with hand surface model to determine hand pose using model-fitting. Although this minimizes the problem of occlusion and allows the hand to move in any way, it cannot estimate the global and local pose simultaneously. This thesis implemented a simulation study to show the feasibility of using Unscented Kalman Filter (UKF) in estimating hand pose using multiple-view silhouette images to simultaneously predict the global and local hand pose. UKF is the predictive filter of choice because it can handle non-linear systems like the complete hand dynamics. Also, compared to other Bayesian-based predictive filters, it is computationally not so expensive to implement. Simulation experiments of up to 19DOF hand showed how accurate it can track the hand pose parameters. The only major disadvantage of the system is long calculation time it takes for the filter to track. Through the use of predictive filtering, simultaneous estimation of global and local hand pose parameters can be realized, as demonstrated in this thesis.