Research Across Disciplines: Human Body Tracking as Interface Device for Interactive Installations
George Legrady, Matthew Turk, Haiying Guan, and Andi
June 2005 - Sept. 2005
University of California, Santa Barbara |  |
Overview
The project is a collaborative work with the Experimental Visualization Lab, Media Arts and Technology. The project is realized as interactive art installation, which emphasis is on aesthetic research through the implementation of computer vision technologies for new forms of content, narratives, experiences and analysis. The system interactively changes the size, shape, and rotation of a graphic cylinder in the display by detecting and tracking the user’s face and hands’ locations using the skin color detection and mean-shift tracking technologies. The work addresses the poetics of presence, and interaction.
Details
Detecting and tracking face and hands are important for gesture recognition and human computer interaction. In this project, firstly, we model the skin color by Gaussian model in the HS color space, and obtain the initial model parameters (mean of the Gaussian model) of the skin color. Then we model the color histogram of the current frame by Gaussian Mixture Model (GMM) using Restrict Expectation-Maximization (EM) algorithm, which adaptively learn the model parameters (mean and variation) in the current frame. After that, based on Bayesian segmentation result, skin color pixels are detected. Secondly, face and hands are localized by k-means algorithm. Finally, they are tracked by the mean-shift algorithm.
1. Skin Color Segmentation
1.1 Skin Color Modeling - Single Gaussian in H-S Space
We use the dataset [1] of skin pixels and non-skin pixels of Cambridge Research Lab, Compaq, to model skin color in HS Space. The dataset contains 80,306,243 skin pixels and 861,142,189 non-skin pixels. Instead of modeling it in the RGB color space, we convert them to the HSV color space, then we project from V direction to the HS space and obtain the HS histogram distribution of the skin color.

The figure shows that the skin color distribution mainly clusters in a small area of the chromatic color space. In the experiments, we approximately model skin color by a single Gaussian in the HS space and obtain the mean of the Gaussian for the initialization of the later EM learning.
1.2 Color Histogram Modeling by Gaussian Mixture Model
We model the color histogram of the input image by GMM.

1.3 Restrict EM algorithm for parameter estimation
Zhu et al. [2] presents a Restrict Expectation-Maximization algorithm to model the skin color with Gaussian Mixture Model given the skin color’s location prior probability. The main idea is to fix the mean of the Gaussian (which model the skin color) during the EM training. In our project, instead of using the prior model the P(Hand|x,y) to obtain the initial mean of the Gaussian, we use the skin color model to estimate the mean of the skin color without any assumption of hands locations, or the area size of skin region.
1.4 Experimental Results
According to Bayesian rule, we segment skin-color from background. We test the segmentation results in different situations, and it shows that the algorithm can detect the skin color pixels of different people with different skin color successfully.


2. Face and hand localization and tracking by Mean-Shift Algorithm
We assume the detected face area is larger than each hand. After pixel-based segmentation and morphological operations, mean-shift algorithm is applied again to find face area. Then the rest pixels should contain skin pixels of two hands, thus we cluster the rest skin-color pixels to two clusters by k-means algorithm, localize the two hands' locations, and track them using mean-shift algorithm. The experiment results are shown below.




References
[1] M. Jones and J. M. Rehg, “Statistical Color Models with Application to Skin Detection”, IJCV, 2001. In International Journal of Computer Vision, Vol. 46, Issue 1, pp. 81-96, 2002.
[2] Xiaojin Zhu, Jie Yang, and Waibel, A. “Segmenting hands of arbitrary color”, in Proc. of Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 446-453, 2000.