Audio Oracle
From IMTR
Audio Oracle is a sequential learning automaton defined on a Music Information Geometry, that learns the repeating structure of an audio stream incrementally and in realtime.
This page contains examples of Audio Oracle results on various types of music styles and instrumentation, and as support page for an "in-review" article.
Contents |
Pros and Cons of Audio Oracle
Audio Oracle comes in two flavors: The first, Data-AO finds states with little assumptions on data and suitable for short environmental sounds; the second, model-AO builds an automaton that detects long-term dependencies and repetitions over long audio streams.
Briefly, Audio Oracle:
- Provides a unified approach to segmentation and structure analysis of audio signals
- Has well grounded geometric motivation thanks to Music Information Geometry
- Links symbolic string matching techniques (Factor Oracles) with probabilistic / information theoretic approaches
- Provides robust and fast structure discovery (realtime) and without exhaustive computations as is the case in general Music Information Retrieval
- Provides direct and fast access to sub-structures of interest in audio.
Audio Oracle can be efficiently used as a front-end for many music applications. Ongoing applications include (but not limited to):
- Query by example over databases (Guidage)
- Applications to concatenative synthesis using user queries (Guidage)
- Applications to Automatic Improvisation using audio streams (OMax)
Sample Results
Natural Bird Utterance (Data-AO)
![]()
Audio Description: Natural bird utterance.
Source: "SoundIdeas" Database.
Description: Data Audio Oracle (one state per each analysis frame) capturing (natural) repetition through suffix links.
All the examples that follow (on music) will employ Model-AO learning:
Beethoven: Piano Sonata Nr.1, Mvt. 3
Audio Description: Beethoven's 1st Piano Sonata, Movement 3, Performed by Friedrich Gulda (1950s) (same piece difference performance on youtube).
Style: Classical.
Instrumentation: Piano
Number of Analysis Frames: 9500
Number of states in Audio Oracle: 440
Notes: The similarity structure learned by AO (left) is similar to the classical similarity matrix (right). Except that it is much sparser, calculated in realtime and directly giving access to structural information in audio.
Beatles: "Love me do"
Audio Description: Beatles "Love me do" (1964 edition).
Style: Pop music.
Instrumentation: Voice, drums and "electronics"
Number of Analysis Frames: 2181
Number of states in Audio Oracle: 331
Philip Glass: Knee Play 3
Audio Description: Philip Glass "Knee Play 3" from Einstein on the Beach" (Hear excerpts on youtube).
Style: Newyork Minimalism
Instrumentation: Vocals
Number of Analysis Frames: 2340
Number of states in Audio Oracle: 219
Notes: This example shows Audio Oracle on vocal music. The music has been chosen due to its highly repetitive nature. Repetitions are quite obvious in the similarity matrix and the oracle structure.
Couperin: "Les Baricades Mistérieuses"
|
|
Autechre: Theft (excerpt)
|
|
Further Readings
Arshia Cont. Modeling Musical Anticipation: From the time of music to the music of time. PhD thesis in Acoustics, Signal Proc., and Computer Sci. Applied to Music (ATIAM). University of Paris 6 (UPMC), and University of California San Diego (UCSD), 2008. (Chapters 4 and 5)
Arshia Cont, Shlomo Dubnov and Gérard Assayag. On the Information Geometry of Audio Streams with Applications to Similarity Computing, IEEE Transactions on Audio, Speech and Language Processing, 2010 (to appear).