Gesture Follower


Real-time following and recognition of time profiles



Basic Example

The Gesture Follower is a system for real-time following and recognition of time profiles. In this example the Gesture Follower learns three gestures, i.e drawings using the mouse, while we simultaneously record voice data.

During the "performance", the Gesture Follower recognizes which gesture is being performed, and plays the corresponding sound, time stretched or compressed depending on the pacing of the gesture.

Synchronizing dance and videos

The gesture follower is used to select and synchronize prerecorded videos, following the dancer gestures. It uses data from inertial sensors worn on the wrists of the dancer

The Gesture Follower is used in this example for real-time recognition of five pre-recorded gestures. As soon as one of these gestures is recognized, a specific gesture-driven audio engine is enabled among the follows:

Grainstick: based on the orientation of the phone, this audio engine selects the correlated segment of sound of an actual rainstick sample

Percussions: percussion samples are triggered anytime a 'hit' is detected

Shaking: this engine triggers with a fix tempo the portion of the loaded sound in which the intensity better corresponds to the energy of the gesture in that particular instant.

Circle: a sound loop is time-stretched based on the speed of the gesture. The speed is automatically detected by the Gesture Follower comparing the performance with the pre-recorded gesture.

Static positions: An audio loop is played whenever the phone assumes a static position for a while.

Grainstick and Shaking audio engines take full advantage from the concatenative synthesis tools contained in the MUBU MaxMSP bundle.

Grainstick and Percussions sound samples by Pierre Jodlowski.


The development of the gesture-follower is pursued with the general goal to compare in real-time a performed gesture with a set of prerecorded examples, using machine learning techniques.

In most standard gesture recognition systems, gestures are considered as units that must be recognized once completed. Therfore, these systems output results at discrete time events, typically at the end of each gesture.

We work with a different paradigm towards for online gesture analysis, motivated by applications on expressive visuals and sound control: the recognizing system outputs "continuously" (i.e. on a fine temporal grain) parameters characterizing the performed gesture. These parameters are obtained by the online comparison with temporal shapes stored in a database.

Precisely, two types of information are continuously updated. These are probabilistic estimations of

1) the similarity of the performed gesture to prerecorded gestures (likelihood) and

2) the time progression of the performed gesture. The first type of information allows for the selection the likeliest gesture at any moment and the second type of information allow for the estimation of the current temporal index inside the gesture, referred here as "gesture following".

These continuous output data are especially well suited for both selecting and synchronizing various continuous visual or sound processes to gestures.


The gf external is now part of the "MuBu for Max" distribution you can access directly here:

Please post any comment, bug report, feature request to the



Personal tools