Computer Vision Lab

Other Projects


3D human body tracking using motion prediction


Context : 

Researchers and designers have made significant efforts in designing humans of realistical appearance using mesh modelling, laser scanners or physical models.  For recovering a realistical motion, two main techniques are used: key-framing and motion capture systems. In key-framing methods the designer specifies by hand several « key » postures and the system does an interpolation to obtain intermediate postures. On the other hand, motion capture systems (capturing real motion) can be very effective, but they force the performer to wear marker devices, optical (Vicon) or magnetical, attached to the body. 

Markerless motion capture has been studied in-depth, but no universal solution has been obtained. Monocular motion capture has more applications than multiview approaches, because only one camera is required, but the results are not good enough for some applications due to the depth ambiguities. Multiview motion tracking tries to deal with this problem using additional information obtained from multiple cameras.

Previous work developed in our lab [3] uses a deterministical framework for tracking 3D human movements. In contrast to other methods, a sophisticated 3D model is used not only for animation but also for tracking purposes. This model is based on soft objects (implicit surfaces), providing an accurate shape description with few parameters. Stereo and silhouette data are used as observations for the tracking.



The use of deterministical frameworks, such as least-squares fitting, suffers from problems such as the presence of local minima, the impossibility to recover after tracking loss, etc. These problems can be solved by using a probabilistical framework such as CONDENSATION [1] or Bayesian Estimation [2].

In this project, a condensation framework for a simple case is given. You will have to extend the framework in order to use it in the case of 3D body tracking using observations such as stereo, silhouettes, etc. The condensation algorithm can be summarised as casting some particles (different samples near the most likely prediction), applying a temporal model, obtaining the new likelihood  by using the similarity between the prediction and the image observations (stereo, silhouettes, ...). Based on this likelihood, new particles are cast. The mean ponderated by the likelihood gives the result for the tracking.


Reading:

[1] M. Isard, A. Blake., "CONDENSATION - conditional density propagation for visual tracking", International Journal in Computer Vision, 29, 1, pages 5-28, 1998..

[2] H. Sidenbladh, M. J. Black, D. J. Fleet., "Stochastic Tracking of 3D Human Figures Using 2D Image Motion",  European Conference on Computer Vision, D. Vernon (Ed.), Springer Verlag, LNCS 1843, Dublin, Ireland, pp. 702-718 June 2000.

[3] R.Plaenkers, P. Fua, “Tracking and Modeling People in Video Sequences”, Computer Vision and Image Understanding, 81(3), March 2001.

Persons in charge: Raquel Urtasun

Emailraquel.urtasun@epfl.ch