Audio Oracle

From IMTR

Audio Oracle is a sequential learning automaton defined on a Music Information Geometry, that learns the repeating structure of an audio stream incrementally and in realtime.

This page contains examples of Audio Oracle results on various types of music styles and instrumentation, and as support page for an "in-review" article.

Contents

Pros and Cons of Audio Oracle

Audio Oracle comes in two flavors: The first, Data-AO finds states with little assumptions on data and suitable for short environmental sounds; the second, model-AO builds an automaton that detects long-term dependencies and repetitions over long audio streams.

Briefly, Audio Oracle:

  • Provides a unified approach to segmentation and structure analysis of audio signals
  • Has well grounded geometric motivation thanks to Music Information Geometry
  • Links symbolic string matching techniques (Factor Oracles) with probabilistic / information theoretic approaches
  • Provides robust and fast structure discovery (realtime) and without exhaustive computations as is the case in general Music Information Retrieval
  • Provides direct and fast access to sub-structures of interest in audio.

Audio Oracle can be efficiently used as a front-end for many music applications. Ongoing applications include (but not limited to):

  • Query by example over databases (Guidage)
  • Applications to concatenative synthesis using user queries (Guidage)
  • Applications to Automatic Improvisation using audio streams (OMax)

Sample Results

Natural Bird Utterance (Data-AO)

Natural bird utterance waveform Data Audio Oracle on natural bird utterance
Audio Description: Natural bird utterance.
Source: "SoundIdeas" Database.
Description: Data Audio Oracle (one state per each analysis frame) capturing (natural) repetition through suffix links.

All the examples that follow (on music) will employ Model-AO learning:

Beethoven: Piano Sonata Nr.1, Mvt. 3

Audio Oracle state-space structure on Beethoven's 1st sonata (3rd mvt). Middle: Recall arrows over time (suffixes) - Bottom: Repetition length
Realtime Similarity Matrix of Audio Oracle (size: 440x440)
Classical Similarity Matrix (size: 9500x9500 , non-realtime)

Audio Description: Beethoven's 1st Piano Sonata, Movement 3, Performed by Friedrich Gulda (1950s) (same piece difference performance on youtube).
Style: Classical.
Instrumentation: Piano
Number of Analysis Frames: 9500
Number of states in Audio Oracle: 440
Notes: The similarity structure learned by AO (left) is similar to the classical similarity matrix (right). Except that it is much sparser, calculated in realtime and directly giving access to structural information in audio.

Beatles: "Love me do"

Model-AO structure on Beatles
AO Similarity Matrix on Beatles (size: 331x331)
Classical Similarity Matrix on Beatles (size 2181x2181)


Audio Description: Beatles "Love me do" (1964 edition).
Style: Pop music.
Instrumentation: Voice, drums and "electronics"
Number of Analysis Frames: 2181
Number of states in Audio Oracle: 331

Philip Glass: Knee Play 3

Audio Oracle state-space structure on Philip Glass' Knee 3. Middle: Recall arrows over time (suffixes) - Bottom: Repetition length
Realtime Similarity Matrix of Audio Oracle (size: 219x219)
Classical Similarity Matrix (size: 2340x2340 , non-realtime)


Audio Description: Philip Glass "Knee Play 3" from Einstein on the Beach" (Hear excerpts on youtube).
Style: Newyork Minimalism
Instrumentation: Vocals
Number of Analysis Frames: 2340
Number of states in Audio Oracle: 219
Notes: This example shows Audio Oracle on vocal music. The music has been chosen due to its highly repetitive nature. Repetitions are quite obvious in the similarity matrix and the oracle structure.


Couperin: "Les Baricades Mistérieuses"

noframe
Audio Description:François Couperin "Barricade Mystérieuse" as performed by Scott Ross (on youtube).
Style: Baroque.
Instrumentation: Harpsichord
Number of Analysis Frames: 9911
Number of states in Audio Oracle: 555
Notes: Mysterious Barricade has a theme coming back and forth throughout the piece (indicated by A and A' above). However other parts can also be reconstructed using a recombination of other chunks to some extent as suggested by Audio Oracle.

noframe

Autechre: Theft (excerpt)

noframe
Audio Description:Autechre's "Theft" (first 2 minutes) (Hear other excerpts on youtube to get an idea!).
Style: Electronia.
Instrumentation: Electronics
Number of Analysis Frames: 3759
Number of states in Audio Oracle: 67
Notes: A repetitive electronic structure, leading to less states and suffix sending in majority to the very beginning (a characteristic of the music).

noframe

Further Readings

Arshia Cont. Modeling Musical Anticipation: From the time of music to the music of time. PhD thesis in Acoustics, Signal Proc., and Computer Sci. Applied to Music (ATIAM). University of Paris 6 (UPMC), and University of California San Diego (UCSD), 2008. (Chapters 4 and 5)

Arshia Cont, Shlomo Dubnov and Gérard Assayag. On the Information Geometry of Audio Streams with Applications to Similarity Computing, IEEE Transactions on Audio, Speech and Language Processing, 2010 (to appear).

Personal tools