CataRT Documentation


Documentation for the cataRT data-driven concatenative sound synthesis system in real time based on unit selection from large databases for Max/MSP/FTM

copyright 2005-2008 Diemo Schwarz

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

See file GPL-license.pat for further informations on licensing terms.



The concatenative real-time sound synthesis system cataRT plays grains (also called units) from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space, controlled by the mouse or by external controllers. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics.

cataRT is implemented in MaxMSP using the FTM and Gabor libraries. Segmentation and MPEG-7 descriptors are loaded from text or SDIF files, or analysed on-the-fly.

Startup and Initialisation

  • Double-click on cataRT-0.9.4 to start CataRT.
  • In the main window that opens after a while, click on the red init button on the left.
  • Switch DSP on by clicking on the loudspeaker icon, augment volume with the sliders (left slider moves both) in the lower left corner of the main window.
  • You can switch the audio driver used for output by double-click on the dac~ box.

Importing Sounds

  • In the upper part of the main window, choose the segmentation mode and parameters see #Segmentation Modes below), or use the default chop mode.
  • To try out large sound files first, you can limit the number of seconds that are imported by dragging up and down in the import limit number box.
  • The Directory to sound set option means that the imported sounds are added to a sound set by their directory name. (See the SoundSet descriptor.)
  • Click the grey import (soundfile) button above catart.import to choose a sound file to analyse in the file dialog that opens. Alternatively, you can drag one or more sound files onto either stack of modules (corpus1 or corpus2).
    • you can drop any number of files, or whole directories whose contents will be imported (but not recursively)
    • You will see the progress of analysis as a green area advancing underneath each stack, and the number and duration of imported units will be displayed underneath. It should be more than one unit per sound file, otherwise there is a problem with segmentation (choose chop and not too short sound files).
  • If you have opened the corresponding catart.lcd, you can see the file's units coming in during analysis
  • After import, reselect the x/y descriptors to refresh the ranges in catart.lcd.

Segmentation Modes

  • none imports the sound file as a whole
  • chop segments into equal-sized units every given ms (change the grain size by dragging up and down in the number box)
    • this is the recommended mode to start with
  • split segments into a given number of equal-sized units
    • this is useful to import drum loops, if you know their number of beats
  • import ASCII allows you to import a segmentation and labels from a text file according to extension.
    • The following extensions are recognised, with these columns:
    time [s], label as written by Audacity
    starttime [s], endtime [s], label
  • import labels forces labels text format as written by Audacity
  • import SDIF allows you to import a segmentation from [SDIF] as written by AudioSculpt
  • yin note segmentation segments per change of pitch
  • silence segmentation splits according to an amplitude threshold (with is very hard to guess at the moment, lacking a graphical interface)


  • open catart.lcd for your corpus by double-click (see #Controlling the Play Window (catart.lcd) below)
  • choose descriptors for the x- and y-axis from the grey pop-up menus labeled X-Axis and Y-Axis, or use one of the presets to the right.
    • To start, using SpectralCentroid and Pitch or Loudness usually gives good results.
    • Descriptors 0-11 describe the position of each unit in the sound file.
    • Descriptors 12-23 describe the mean acoustic characteristics of each unit.
    • Note that for some descriptors, all units might have the same value, so that they will all be aligned to the left or upper border of the display.
  • move around in lcd, units closest to mouse are played
  • drag right/left to increase the random selection radius
  • click to freeze position, ctrl-click to unfreeze (watch the checkbox labeled position freeze to the lower right, or click in it)
    • this is useful to change parameters without losing the position when units play regularly e.g. in beat mode
  • experiment with different descriptors on x-/y-/colour-axis
  • zoom in by changing the min/max next to the descriptor menu by dragging up/down in the number box. (Dragging left of the decimal point changes the integer value, dragging right of it changes in steps of 1/100.) You can reset the view by re-selecting a descriptor.

Trigger Modes

Try out different trigger methods (except quant and seq which don't work in the standalone version): The trigger method controls when the selected grains are played.

  • bow triggers closest unit each time you move the mouse
  • fence plays a unit whenever a different unit becomes the closest one (named in homage to clattering a stick along a garden fence)
  • beat mode triggers units via a metronome (speed is controlled by grain rate and rate std)
  • chain mode triggers a new unit whenever the previous unit has finished playing
  • quant is a quantized metronome, but non functional for the moment
  • seq is for external triggering by a sequencer
  • cont continues playing grains in order

Grain Playback Parameters

Use the common granular synthesis parameters to the left:

  • working set: don't use this!
  • radius: size of the random selection radius, also controlled by dragging right/left
  • trigger method: choose trigger mode (see above)
  • grain rate: trigger speed for beat mode in ms
  • rate std: amount of random deviation of grain rate in ms
  • xfade: fade-in and fade-out time of each grain in ms
  • attack, release: separate fade-in and fade-out time of each grain in ms
    (xfade and attack/release alternatively control the grain envelope)
  • grain size: force length of played grain in ms, 0 means natural size
  • size std: amount of random variation around grain size
  • onset std: amount of random variation of grain start
  • transposition: pitch change in half-tones
  • std: amount of random variation of transposition
  • gain: volume change of grain in dB
  • std: amount of random variation of gain
  • reverse prob: chance of playing grain reversed, 0-100%
  • pan: stereo position 0 (left) - 100 (right)
  • std: amount of random variation of panning per grain

Controlling the Play Window (catart.lcd)

The following image shows the different controls in the catart.lcd play window.


Saving/Loading Corpora

Whole corpora (list of sound files, descriptors, and unit data) can be saved and reloaded, or merged with the current corpus in memory, by clicking on the grey import, export, merge buttons above, which open a file dialog.

A corpus is saved as four text files (.ds.txt, .sf.txt, .sy.txt, .ud.txt), you can choose any of these or just the base name (keep it short, since Max's file dialogs are still stuck with 31 char names).

Sound files in the corpus are searched in four different places, one after the other:

  1. if the full path given at import is not found,
  2. try in the Max search path with the base name,
  3. then try to load flac-compressed version <basename>.flac in the full path,
  4. then try to load flac-compressed version in the Max search path,

Recording CataRT's output

To record what is played by CataRT to a sound file on disk, choose a file to write by clicking on the button open in the lower part of the main window, and then click the check box left of that to start recording, then uncheck it to stop recording.


  • workingset, despite its name, doesn't work at the moment
  • In the segmentation parameters, Max Length for yin or amplitude segmentation has no effect


Personal tools