Short-time Viterbi demo

From IMTR

Realtime phoneme transcription from audio, in Max/MSP

This prototype patch implements ideas described in my 2008 ICASSP publication. The trained acoustic models and the testing sentences are in french. The main interest of the underlying decoding algorithm is the realtime adaptation of the Viterbi algorithm. Some errors are due to the lack of a language model (i.e. the model only knows how french sounds, but doesn't know any french word), which could be added.


This patch heavily relies on the FTM library for Max/MSP.


Matlab view :

This shows the basic idea of the algorithm. The yellow rectangle is the current decoding window. Its right edge moves linearly with time, for each new observation frame. Its left edges moves only when a fusion-point is detected among all possible local-paths (thin black curves), and a decision is output. Notice how the offline Viterbi path matches the locally-fusing paths.

[back]

Personal tools