My research focuses on the problem of describing a live audio source into a sequence of labelled objects, that is have a computational system able to sing the notes while the musician is playing them, as a well trained hear would do. The source is typically a solo instrument, while the extracted objects range from onsets offsets to pitch, notes or harmonics.

So far, we have been focused on onset detections and pitch estimation. These analysis modules have been implemented through the techniques developed here at Queen Mary University, by Samer Abdallah, Juan Pablo Bello and Chris Duxbury. The algorithms have been adapted to work in a real-time context, within 15 ms latency or less. Together, they allow us to extract note objects within the same latency.

Constructing a semantic description of the audio source, we can send its description to a synthesiser, providing a virtual accompaniment to the musician. Typically, the extracted notes are sent to a MIDI synthetiser. Other synthesis approaches such are physical modeling are also being investigated.

This project is closely related to MPEG-4 Structured Audio content, a subset of the MPEG-4 standard, that includes the MIDI and SoundFont norms. Extracted features will also feed some MPEG-7 descriptors of the SIMAC project.


