Queen Mary, University of London
Department of Electronic Engineering
 Home  Undergraduate Postgraduate International  Research  Employment  Contact
Electronic Engineering > Contact > People > Dr Samer A Abdallah

Unsupervised analysis of polyphonic music using sparse coding

This page contains supporting material for the article:

S. A. Abdallah and M. D. Plumbley. Unsupervised analysis of polyphonic music using sparse coding. IEEE Transactions on Neural Networks, 17(1), 179-196, January 2006.

Synthetic harpsichord recording

These results were generated by analysing a recording of a computer generated MIDI performance using a wavetable-synthesised harpsichord sound. The resulting 11.025 kHz mono signal is available as an uncompressed WAV (13 MB) or an MP3 (3 MB). It is also available as seven separate (uncompressed) movements [Movement 1, Movement 2, Movement 3, Movement 4, Movement 5, Movement 6, Movement 7].

Sparse coding resulted in the dictionary shown above, which is available for download as a MAT file (105 kB) or a compressed text file (97 kB). Many of the dictionary elements have approximately harmonic spectra, and when sonified (by using the spectra to filter Gaussian white noise) give rise to a clear pitch percept. The sonified dictionary elements are available as an uncompressed WAV (368 kB) or an MP3 (48 kB). The ordering corresponds to that in the above figure.

The sparse encoding of the audio spectrum using the above dictionary results in a rather piano-roll like representation, which, because of the almost one-to-one correspondence between dictionary elements and notes, can be used to generate a MIDI encoding of the music. We used a simple threshold-crossing detector to trigger MIDI events; some of the resulting MIDI files are available here [MIDI file 0, MIDI file 1, MIDI file 2, MIDI file 3]. Note that these MIDI files use the piano patch rather than the harpsichord patch as the harpsichord patches on most consumer systems (and some would say harpsichords in general) are rather painful to listen too.

Real piano recording

The results in this section were generated by analysing real piano recordings from two commercially available CDs (Jeno Jando playing Bach's Well Tempered Clavier, Naxos 855097071, and Andras Schiff playing Bach's Two and Three Part Inventions). The stereo signals were down-sampled to 11 kHz and the left and right channels summed before the analysis.

Sparse coding resulted in the above dictionary. The two following audio files differ in the way the sonified dictionary elements were normalised: in this MP3 (239 kB), the overall power relationships between the different elements are preserved (and hence some of them are much quieter than others, while in this MP3 (239 kB), each element was individualy scaled to have the same energy so that they all have similar loudnesses.

The above figure shows the sparse decomposition of the beginning of the Three Part Invention, No. 9, while the figure below traces the total activity summed across similarly pitched dictionary elements.
As an example of the processing which the sparse decompostion enables, these audio files show what happens if a signal (in this case, the Fugue No. 14 from the Well Tempered Clavier, Book 1) is resynthesised after masking out some of the components. In the first example (541 kB), the dictionary elements associated with low notes were masked out, before synthesising the signal from the sparse decomposition (by Weiner filtering the original signal). In the second example (554 kB), the high notes were masked out, though the results are less satisfactory this time, since many of the bass notes are also weakened.


This work was funded by EPSRC grant Automatic Music Transcription using ICA.

© Queen Mary, University of London 2005
Electronic Engineering, Queen Mary University of London, Mile End Road, London E1 4NS, UK Tel: +44 (0)20 7882 5346, Fax: +44 (0)20 7882 7997