Haiying Guan's Home Page

Home
Research Projects
Medical Image Retrieval
Hand Pose Estimation
MultiView Pose Estimation
Interactive Installation
Gesture Recognition
Video Surveillance
Hand Posture Estimation
Industry Projects
Teaching
Courses
Publications
Patents
Curriculum Vitae
Biography
 

Multi-view Appearance-based 3D Hand Pose Estimation

 

Haiying Guan, Jae Sik Chang, Longbin Chen, Rogerio S. Feris, and Matthew Turk

 

Jan. 2005 - July 2007

 

CS Department, University of California, Santa Barbara

 

Overview

We describe a novel approach to appearance-based hand pose estimation which relies on multiple cameras to improve accuracy and resolve ambiguities caused by self-occlusions. Rather than estimating 3D geometry, our approach uses multiple views to extend current exemplar-based methods for estimating hand pose by matching a probe image with a large discrete set of labeled hand pose images. We formulate the problem in a MAP framework, where the information from multiple cameras is fused to provide reliable hand pose estimation. Our quantitative experimental results show that the correct estimation rate is much higher using our multi-view approach than using a single-view approach.

Details

1.1 Motivation

Occlusion is a major problem of appearance-based systems. It is hard to resolve the ambiguity using a single side view image. An extreme example is shown in Fig. 1. If only the side view (shown in the second row) of the gesture is given, it is hard to recognize the real postures. Thus, we propose a multi-view hand pose estimation algorithm.

 

 

Figure 1. Hand occlusion problem of appearance-based posture estimation method

 

1.2 MAP framework using multi-view images

To effectively utilize the information obtained from multi-view cameras, we proposed a MAP framework based on Bayesian theory. The posteriori probability of hand state given hand images assuming the conditional independency shown in Fig. 2 is given by:

Figure 2. Hand images' conditional independency

 

The probability of hand image given hand states  is

The likelihood of hand real image given hand model image can be estimated by

The maximization of posteriori probability is simplified to minimize the following energy function:

 

1.3 Homogeneous transformations

Fig. 3 illustrates the system setup. The cameras could be set up at any locations surrounding and pointing to the  hand.

 

 

Figure 3 The system setup

 

The transformation from camera 2 to camera 1 is given by

 

 

The transformation from hand frame to camera frame is given by

¡¡

1.4 Hand Contour Extraction

We build an active contour model based on level sets methods.

In our model, the common boundary is given by  the motion equation (region competition) is given by (illustrated in Fig. 5),

The level set equation is given by

 

 

Figure 5 Active contour model

 

Fig. 5 shows the experimental results for hand contour and region detection.

 

Figure 6 Hand contour extraction and hand region detection results

 

1.5 Experimental Results

 

1.5.1 Synthesis dataset

Our synthesis dataset contains 15 gestures, and each gesture is captured from 448 viewpoints evenly sampled from a 3D view sphere. The synthesis dataset contains 20160 images totally.  

 

1.5.2 Real image test dataset

Our test dataset contains 7 hand states for each gesture, and totally has 254 cases, 508 hand images.

Table 1. shows the comparisons of retrieval rates using the hand images captured by Camera 1 only, by Camera 2 only and by both of them.

 Cam \ K

50

100

150

200

250

300

Cam 1 only

19.69

30.71

36.61

39.76

44.88

49.61

Cam 2 only

14.57

23.23

29.13

32.68

37.40

40.55

Both

48.03

61.42

69.69

74.80

78.74

81.50

Table 1. The comparison of retrieval rates

Figure 7 shows the comparison curves. It clearly shows that the retrieval performance using two viewpoints is much better than using single viewpoint.

 

Figure 7 The performance comparison of the single view and two view algorithm

 

1.6 Conclusions

In this project, we proposed a MAP framework to resolve ambiguities caused by the occlusion problem, which is the main problem of the appearance-based pose estimation algorithms. The experimental results shows that the performance of the multi-view algorithm based on MAP frame work is much better than the single view algorithm. Active contour model based on level sets algorithm is implemented for hand contour extraction and hand region detection. The algorithm works well in complex background. 

 

Publications

 

[1] Haiying Guan, Longbin Chen, and Matthew Turk, “Multi-view Hand Pose and Posture Manifold Representation using the Hierarchical Isometric Self-Organizing Map”, submitted to Image Vision and Computing, Aug. 2, 2008.

 

[2] Haiying Guan, Jae Sik Chang, Longbin Chen, Rogerio S. Feris, and Matthew Turk, “Multi-view appearance-based 3D hand pose estimation”, In IEEE Workshop on Vision for Human Computer Interaction, Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'06), pp. 154-159, 2006.  (PDF)(Slides)