CataRT App Documentation
Documentation for the standalone CataRT application for data-driven concatenative sound synthesis system in real time based on unit selection from large databases for Max/MSP.
copyright 2005-2010 Diemo Schwarz, IMTR Team, Ircam - Centre Pompidou
The concatenative real-time sound synthesis system CataRT plays grains (also called units) from a large corpus of segmented and descriptor-analysed sounds according to proximity to a target position in the descriptor space, controlled by the mouse or by external controllers. This can be seen as a content-based extension to granular synthesis providing direct access to specific sound characteristics.
cataRT is implemented in MaxMSP using FTM&Co by Norbert Schnell and collaborators.
Startup and Initialisation
- Double-click on cataRT-1.1.0.app to start the CataRT standalone application.
- At the first use, check that your audio card is configured correctly by clicking "setup" in the audio panel. This opens the DSP Settings window where you can choose and configure your sound card (see below for working settings).
- Settings are saved between sessions of the CataRT app in the file Max 5 Preferences.
- Switch DSP on by clicking on the green button in the audio panel in the lower-left corner of the main window, shown below.
- In the upper part of the main window, choose the segmentation mode and parameters see #Segmentation Modes below), or use the default chop mode.
- To try out large sound files first, you can limit the number of seconds that are imported by dragging up and down in the import limit number box.
- The Directory to sound set option means that the imported sounds are added to a sound set by their directory name. (See the SoundSet descriptor.)
- Click the red Import Audio button to choose a sound file to analyse in the file dialog that opens. Alternatively, you can drag one or more sound files anywhere onto the application window.
- you can drop any number of files, or whole directories the contents of which will be imported recursively
- You will see the progress of analysis as a green bar advancing, and the number and duration of imported units will be displayed underneath. It should be more than one unit per sound file, otherwise there is a problem with segmentation (choose chop and not too short sound files).
- After import, you might want to reselect the x/y descriptors to refresh the ranges in the 2D controller.
- none imports the sound file as a whole
- chop segments into equal-sized units every given ms (change the grain size by dragging up and down in the number box)
- this is the recommended mode to start with
- split segments into a given number of equal-sized units
- this is useful to import drum loops, if you know their number of beats
- import ASCII allows you to import a segmentation and labels from a text file according to extension.
- The following extensions are recognised, with these columns:
- time [s], label as written by Audacity
- starttime [s], endtime [s], label
- import labels forces labels text format as written by Audacity
- import SDIF allows you to import a segmentation from [SDIF] as written by AudioSculpt
- yin note segmentation segments per change of pitch
- silence segmentation splits according to an amplitude threshold (with is very hard to guess at the moment, lacking a graphical interface)
Playing with the Mouse
- grains are played in the 2D controller shown above
- choose descriptors for the x- and y-axis from the grey pop-up menus labeled X-Axis and Y-Axis, or use one of the Layout Presets.
- To start, using Brilliance / Noisiness (for SpectralCentroid and Pitch) or Loudness usually gives good results.
- Descriptors 0-11 describe the position of each unit in the sound file.
- Descriptors 12-23 describe the mean acoustic characteristics of each unit.
- Note that for some descriptors, all units might have the same value, so that they will all be aligned to the left or upper border of the display.
- move around in the 2D controller, units closest to mouse are played
- drag right/left to increase the random selection radius
- click to freeze position, double-click to unfreeze (watch the checkbox labeled position freeze to the lower left, or click in it)
- this is useful to change parameters without losing the position when units play regularly e.g. in beat mode
- the current descriptor position is displayed as shown below:
- experiment with different descriptors on x-/y-/colour-axis
- zoom in/out by pressing '=' / '-', '0' to zoom out fully, or by changing the min/max next to the descriptor menu by dragging up/down in the number box. (Dragging left of the decimal point changes the integer value, dragging right of it changes in steps of 1/100.) You can reset the view by re-selecting a descriptor (or clicking the text left of the descriptor menu).
- move the view with the cursor keys
The descriptors that are analysed within CataRT are briefly explained in the following. However, you can skip this section for the beginning and just try them out, for some are very easily understood by exploring them interactively. Keep in mind that some descriptors make little sense for certain types of sound, e.g. Pitch for unpitched, noisy sounds.
The first 13 descriptors (0-12), are mainly for bookkeeping, and describe the segments themselves:
- unique index
- relative index per sound file
- type of the unit (unused)
- index of the sound file (starting from 0)
- index of the sound set (group of sounds, e.g. given by the directory)
- group of sounds (unused)
- is unit selectable? (internal use)
- starting sample index (internal use)
- number of samples (internal use)
- start time in sound file in ms
- duration in ms
- start time normalised to 0..1
- UnitID of previous unit, wrapping around end of sound file (internal use)
The following 12 sound descriptors (13-24) describe the audio content of the segments. They are the segment's average of the instantaneous descriptor values that are calculated each 40 ms.
- fundamental frequency in Hertz
- fundamental frequency in (fractional) MIDI notation
- volume in decibel (dB). These numbers are negative, since 0 dB is the "full scale" reference level.
- a measure between harmonic and noisy sounds
- another measure of the noisiness/sinusoidality of the spectrum
- The spectral centroid (measured in Hertz) is defined as the center of gravity of the FFT magnitude spectrum. It is closely related to the perception of brightness.
- energy in the upper band of the spectrum
- energy in the middle band of the spectrum
- relative energy in the upper band of the spectrum
- first-order autocorrelation coefficient, expressing spectral tilt that is correlated with the relative percieved intensity of the sound in vowels or instruments
- linear average energy measure
- index of a label, e.g. imported during segmentation from a text, wav or aiff file
Grain Playback Parameters
Use the common granular synthesis parameters to the left: (drag up/down in the number box; dragging left of the decimal point changes the integer value, dragging right of it changes in steps of 1/100).
- trigger mode: choose trigger method (see below)
- grain rate: trigger speed for beat mode in ms
- rate rand: amount of random deviation of grain rate in ms
- quant step: quantised trigger speed level for quant mode
- chainlen: number of units repeated in chain mode
- radius: size of the random selection radius, also controlled by dragging right/left
- knn: alternatively, play a random unit of the k-nearest neighbours (this disables radius when > 0)
- xfade: choose envelope type and parameters of each grain in ms:
- xfade: linear fade-in and fade-out time
- chant: glottal pulse window with attack and release time
- sine: sinusoidal window (no parameter)
- hann: Hann window (no parameter)
- hanning: Hann window with offset (no parameter)
- blackman: Blackman window (no parameter)
- blackman-harris: Blackman-Harris window (no parameter)
- grain size: force length of played grain in ms, 0 means natural size from segmentation
- size rand: amount of random variation around grain size
- onset rand: amount of random variation of grain start
- transposition: pitch change in half-tones
- transp. rand: amount of random variation of transposition
- gain: volume change of grain in dB
- gain rand: amount of random variation of gain
- reverse prob: chance of playing grain reversed, 0-100%
- pan: stereo position 0 (left) - 100 (right)
- pan rand: amount of random variation of panning per grain
The Reset Params button resets all play parameters except the trigger mode to their default, neutral, values.
Try out different trigger methods (except quant and seq which don't work in the standalone version): The trigger method controls when the selected grains are played. Each trigger mode can be selected by a key given in parenthesis:
- bow (W) triggers closest unit each time you move the mouse
- fence (F) plays a unit whenever a different unit becomes the closest one (named in homage to clattering a stick along a garden fence)
- beat (B) mode triggers units via a metronome (speed is controlled by grain rate and rate rand)
- chain (L like loop) mode triggers a new unit whenever the previous unit has finished playing
- quant (Q) is a quantized metronome, controllable by MIDI only
- seq (S) is for external triggering by a sequencer
- cont (C) continues playing grains in order
- click (K) play a grain by clicking with the mouse
- grab (G) clicking with the mouse to grab a grain, then bow
Playback Parameter Presets
All the above parameters can be saved in presets in the above preset panel. Set a number to save to in the green number box and press store, then enter a name in the dialog box. Alternatively, press quickstore to save to the next free slot. Press delete to remove the given slot.
Recall a preset by selecting it in the grey pop-up menu. If the interpolation checkbox is on, the settings will interpolated in the given interpolation time.
Use read, write, writeagain to load and save all presets to a file, and view all and current to see tables of settings.
The layout presets offer useful combinations of X/Y axis descriptors with more human-readable names (see also #Descriptors above for more explanations about the meanings of the various descriptors):
- Brilliance stands for the SpectralCentroid descriptor
- Noisiness for Periodicity
- Note for MIDI Note Number
- Time for StartTime of a unit in the sound file
- Index for UnitID, the global number of each unit
Tile mode allows to split the 2D pane into areas according to groups of sounds files or sound sets. In each tile contains the same descriptors as its X- and Y-axis. This is useful to compare sounds or sound sets side by side, either over time (first figure below) or by timbre (second figure). The MIDI playback can traverse these tiles, giving something similar to velocity layers.
Editing a corpus consists in removing unwanted units and sound files, and adapting their location in 2D.
Removing Units and Sound Files
Removal of units takes place in two steps:
- Disable unwanted units: the currently selected unit (with the red circle around it), or all units in the green selection radius can be disabled by pressing the <delete> key. They will be greyed out and unselectable. All units of a loaded sound file can be disabled/reenabled via the sound file menu and the checkbox in front.
- The disabling can be undone with the "Reactivate" button, that reenables all units.
- Delete units: this will delete all disabled units permanently and remove them from the corpus. The corpus can then be saved to make the changes persistent.
Given a choice of descriptor axes, when activating the checkbox "distribute points evenly", the units will be distributed uniformly in the 2D display by a mass–spring trellis algorithm described in Lallemand and Schwarz 2011. The algorithm tries to preserve the relationships between units as well as possible, i.e. points that were close before will remain neighbours. The algorithm may run a few seconds when the corpus is large. When the algorithm is finished, the checkbox unchecks itself, but can also deactivated by the user to interrupt the algorithm.
The distributed point layout is displayed immediately for convenience, but is actually stored in a pair of new descriptors DistX/DistY that can be selected from the axis descriptor menus. These new point coordinates will be saved with the corpus.
Whole corpora (list of sound files, descriptors, and unit data) can be saved and reloaded, or merged with the current corpus in memory, by clicking on the Load, Save, Merge buttons, which open a file dialog.
A corpus is saved as four text files (.ds.txt, .sf.txt, .sy.txt, .ud.txt), you can choose any of these or just the base name. A corpus can also be exported as a .mubu SDIF file, to be loaded by the patch version of CataRT based on Mubu for Max (catart-mubu-live within the Mubu distribution).
Sound files in the corpus are searched in four different places, one after the other:
- if the full path given at import is not found,
- try to load flac-compressed version <basename>.flac in the full path,
- then try in the Max search path with the base name,
- then try to load flac-compressed version in the Max search path,
The Sample Inspector Window
This window shows the detailed numerical and textual values of the descriptors of the unit (grain) selected in the play window, or entered in the number box unit ID. The Value Cooked column shows the raw floating point numerical descriptor value interpreted as boolean or symbolic value (for labels, sound file, sound set).
Input and Output
Recording CataRT's Output
To record what is played by CataRT to a sound file on disk, choose a file to write by clicking on the button open in the "Record" panel in the lower part of the main window, and then click the red button left of that to start recording, then uncheck it to stop recording.
Set the recording level using the volume slider to the left and the Level meters to its right.
Playing with a MIDI Keyboard
Incoming MIDI Note Numbers in the key range set to the left are mapped to the X-axis position. This allows to traverse and play the grains in the corpus. Note: use the bow trigger mode to ensure a grain is played for each receive Note On message. The x-axis can be set to the NoteNumber (pitch) descriptor and its limits forced to the key range by clicking on to x-axis.
The y-axis, rate and gain is controlled by velocity and modulation wheel as determined by the MIDI mapping below.
Pitch Bend interpolates between parameter presets 1, 2, and 3, if they have been assigned (see #Playback Parameter Presets).
The MIDI input panel allows to choose which MIDI input port is used to control CataRT. By default, all inputs are active.
- The ports to cataRT-app-1.6.0 1 and to cataRT-app-1.6.0 2 are internal MIDI ports that can be used to send MIDI to cataRT from other software, e.g. sequencers.
- The list in the pop-up menu is updated when you click the Init button.
The MIDI Mapping choice shown below allows two choices for the y-axis mapping: straight or crossed. Change by clicking on the parallel or crossed lines.
- in straight mapping, the Note velocity controls the Y-axis position, and the modulation wheel the grain rate
- in crossed mapping, the Note velocity controls the gain, and the modulation wheel the y-axis position
Note: Use the beat trigger mode for straight mapping and rate control by the modulation wheel.
MIDI Implementation Chart
The following incoming MIDI messages are interpreted by cataRT:
- Note Number: X-axis
- Note Velocity: Y-axis or gain, depending on MIDI mapping (see #Playing with a MIDI Keyboard)
rate or y-axis, depending on MIDI mapping (see #Playing with a MIDI Keyboard)
interpolate between presets 1, 2, and 3
MIDI control changes are mapped to the playback parameters (see #Grain Playback Parameters) as given in the following table. The controller values 0-127 are mapped linearly to the range given by <min parameter> - <max parameter>.
|Parameter||MIDI Controller Number||Min Parameter||Max Parameter|
|quant direct value||24||0||15|
|grain size rand||12||0.||500.|
|preset recall interpolation time||23||0.||2000.|
The CataRT standalone app can be controlled by sending OSC messages to the control port that can be set in the "IO Settings" window (per default 8480).
OSC messages of the form /corpus1/<parameter> <value> will be dispatched to the parameter. See the presets database ("view all" or "current") for all accessible parameter names (be sure to remove the "synth::" prefix).
|quant direct value||/corpus1/quantsel|
|env-ar||/corpus1/env-ar <attack time [ms]> <release time [ms]>|
|grain size rand||/corpus1/duration_std|
|x/y position||/catart/select <x> <y>|
The OSC message /catart/select <x> <y> controls the target position in the lcd display by coordinates between 0 and 1. The OSC message /catart/playunit <n> unconditionally triggers unit n.
TUIO Multi-Touch Input
TUIO, a standard protocol for tangible user interfaces based on OSC sent by many multi-touch devices or apps, is received by CataRT on the OSC port set in the "IO Settings" window below, and on the standard TUIO port 3333.
The touch messages (add finger, move finger, remove finger) are used for multi-point play in bow or fence mode. (In all other modes, only the last moved finger is taken into account, i.e. only one unit will play/loop.)
Roli Blocks Lightpad Control
When the Roli Lightpad block is connected, its activation and layout can be set in IO Settings. The mapping from pressure to gain in dB can also be edited here.
Known Bugs and Errors
- If there is no sound, make sure the sound card is configured (see #Startup and Initialisation) and then disable and re-enable audio with the green button.
- Some actions might cause errors that are reported in the Max Window (open it from the window menu or with Command-M)
- Importing sound files won't work if
- Audio is not configured (the green light bottom left won't light up), or
- Any part of the full path of the sound file contains either a slash '/' or a colon ':'
- (as always, the errors are reported in the console, visible with Command-M)
- In the segmentation parameters, Max Length for yin or amplitude segmentation has no effect, and the amplitude segmentation threshold setting is very hard to get right, making it almost unusable
- If chain (loop) or cont mode won't play, switch back to fence, move a bit in the 2D display, and try again.
- If the application doesn't start, you might want to remove the file Max 5 Preferences where the catart standalone stores its persistent settings.
- cataRT page: http://imtr.ircam.fr/imtr/CataRT
- mailinglist (low volume and spam-free): http://listes.ircam.fr/wws/info/concat
- cataRT extended article: http://recherche.ircam.fr/anasyn/schwarz/publications/jim2008