3D Human Body Tracking
using Temporal Models
In recent years, much work has been
devoted to increasing the robustness of people tracking
algorithms by introducing motion models. Most approaches
rely on probabilistic methods, such as the popular CONDENSATION
algorithm, to perform the tracking. While effective,
such probabilistic approaches require exponentially
large amounts of computation as the number of degrees
of freedom in the model increases, and can easily become
trapped into local minima unless great care is taken
to avoid them.
By contrast, we use temporal
motion models based on Principal Component
Analysis (PCA) to formulate
the tracking problem as one of minimizing differentiable
objective functions. Our experiments show that the differential
structure of these objective functions is rich enough
to take advantage of standard deterministic optimization
methods, whose computational requirements are much smaller
than those of probabilistic ones and can nevertheless
yield very good results even in difficult situations.
We use stereo data acquired using a
Digiclops operating at a 640x480 resolution and a 14Hz
framerate, which is relatively slow when it comes to
capturing a running motion. The quality of the data
is poor for several reasons:
- First, to avoid motion blur, we
had to use a high shutter speed that reduces exposure
- Second, because the camera is fixed
and the subject must remain within the capture volume,
she appears to be very small at the beginning of the
As a result the data of Figure 1 is
very noisy and lacks both resolution and depth.
Figure 1: Input stereo
data. Top row: First image of a synchronized trinocular
video sequence at three different times. The 3--D points
computed by the Digiclops system are reprojected onto
the images. Bottom row: Side views of these 3--D points.
Note that they are very noisy and lack depth.
For tracking purposes only a small manual
interaction is needed. The global position for the first
frame is manually initialized, as the virtual time positions
for the first and last frames, interpolating at a constant
speed the other frames. Then the global motion is compute
for every frame in a recursive way. Optimized values
for frame t are the initialization values for frame
Once the global motion is recovered,
two different algorithms have been implemented depending
of the type of motion to track.
1. Tracking steady motion
In the first one, the assumption that
the movement is steady has been done,
and only a set of PCA parameters have been optimized
for the whole sequence, since if the motion does not
vary, only one set of parameters is necessary to describe
a motion. Very satisfactory results are shown in Figure
2 for walking sequence.
Tracking a steady walking.
2. Tracking variable motion
When the style changes, or even the
activity, the system is not flexible enough to have
good results for the whole sequence. To solve that we
have done a new tracker where there is an entire set
of PCA parameters for each frame, allowing the system
to automatically evolve from one activity to another.
This is shown in Figure 5, where the subject starts
walking, then for a couple of frames she performs the
transition and then runs. Results for a non-steady running
are shown in Figure 3, while in Figure 4 for the variable
Figure 3: Tracking a
running motion while allowing the style to vary. The
legs are correctly positioned in the whole sequence.
Figure 4: Tracking a
walking motion while allowing the style to vary.
We can partially overcome one of the
major limitations of approaches that rely on motion-models,
namely that they limit the algorithms to the particular
class of motion from which the models have been created.
This is achieved by performing PCA on motion databases
that contain multiple classes of motions as opposed
to a single one, which yields a decomposition in which
the first few components can be used to classify the
motion and can evolve during tracking to model the transition
from one kind of motion to another.
Figure 4: Tracking the
transition between walking and running. In the first
four frames the subject is running. The transition occurs
in the following three frames and the sequence ends
We show the effectiveness of the proposed approach by
using it to fit full-body models to stereo data of people
walking and running and whose quality is too low to
yield satisfactory results without models. This stereo
data simply provides us with a convenient way to show
that this approach performs well on real data. However,
any motion tracking algorithm that relies on minimizing
an objective function is amenable to the treatment we
R. Urtasun, P. Fua
Human Body Tracking using Deterministic Motion Models
In European Conference on Computer Vision, Prague, Czech
Republic, May 2004