Constant-Q Transform Toolbox for Music Processing

Introduction

This web-page accompanies the paper

Schoerkhuber, C. and Klapuri, A., " Constant-Q transform toolbox for music processing," submitted to the 7th Sound and Music Computing Conference, Barcelona, Spain.
Abstract:
This paper proposes a computationally efficient method for computing the constant-Q transform (CQT) of a time-domain signal. CQT refers to a time-frequency representation where the frequency bins are geometrically spaced and the Q-factors (ratios of the center frequencies to bandwidths) of all bins are equal. An inverse transform is proposed which enables a reasonable-quality (around 55dB signal-to-noise ratio) reconstruction of the original signal from its CQT coefficients. Here CQTs with high Q-factors, equivalent to 12--96 bins per octave, are of particular interest. The proposed method is flexible with regard to the number of bins per octave, the applied window function, and the Q-factor, and is particularly suitable for the analysis of music signals. A reference implementation of the proposed methods is published as a Matlab toolbox. The toolbox includes user-interface tools that facilitate spectral data visualization and the indexing and working with the data structure produced by the CQT.

Download the toolbox

A reference implementation of the proposed methods is available as a Matlab toolbox here.

Audio examples with the corresponding constant-Q transforms

The redundancy factor R in all cases below is 5.6. The redundancy factor is defined as R=2c/s, where c is the number of CQT coefficients in the representation, s is the number of samples in the input signal, and the factor 2 is due to the fact that the CQT coefficients are complex-valued.

Guitar + vocals: original

12 bins/octave CQT reconstructed SNR: 51 dB
24 bins/octave CQT reconstructed SNR: 58 dB
48 bins/octave CQT reconstructed SNR: 59 dB
96 bins/octave CQT reconstructed SNR: 60 dB

String ensemble: original

12 bins/octave CQT reconstructed SNR: 49 dB
24 bins/octave CQT reconstructed SNR: 57 dB
48 bins/octave CQT reconstructed SNR: 59 dB
96 bins/octave CQT reconstructed SNR: 61 dB

Drums + percussion: original

12 bins/octave CQT reconstructed SNR: 51 dB
24 bins/octave CQT reconstructed SNR: 59 dB
48 bins/octave CQT reconstructed SNR: 61 dB
96 bins/octave CQT reconstructed SNR: 63 dB

Piano: original

12 bins/octave CQT reconstructed SNR: 52 dB
24 bins/octave CQT reconstructed SNR: 57 dB
48 bins/octave CQT reconstructed SNR: 61 dB
96 bins/octave CQT reconstructed SNR: 64 dB

You probably have noticed some white spots in the area of quick temporal changes in the plots above. These spots stem from the fact that two temporal impulses which are too close to each other to be displayed as separate events in the CQT spectrum (depending on the time resolution) will occur in the spectrogram as one single event evolving an interference pattern. This effect is not distinctive to the CQT as it can also be observed in DFT spectrograms. The following plot shows the DFT spectrum of 3 pairs of closely spaced temporal impulses with varying spacings. The interference patterns in this plot depend on the spacing of the impulses as well as the FFT frame size.
STFT spectrogram

The following plots illustrate the time/frequency trade-off for the constant-Q transform. The plots depict the CQT spectra for several steady sinusoids placed a perfect fifth apart from each other. Note that the time resolution is worse for from hiqh to low frequency bins as the frequency resolution increases.
12 bins/octave, 24 bins/octave 48 bins/octave, 96 bins/octave