Multi-band multi-centroid clustering based permutation alignment for frequency-domain blind speech separation

Abstract

This paper investigates the permutation ambiguity problem in frequency-domain blind source separation and proposes a robust permutation alignment algorithm based on inter-frequency dependency, which is measured by the correlation coefficient between the time activity sequences of separated signals. To calculate a global reference for permutation alignment, a multi-band multi-centroid clustering algorithm is proposed where at first the permutation inside each subband is aligned with multi-centroid clustering and then the permutation among subbands is aligned sequentially. The multi-band scheme can reduce the dynamic range of the activity sequence and improve the efficiency of clustering, while the multi-centroid clustering scheme can improve the precision of the reference and reduce the risk of wrong permutation among subbands. The combination of two techniques enables to capture the variation of the time-frequency activity of a speech signal precisely, promising robust permutation alignment performance. Extensive experiments are carried out in different testing scenarios (up to reverberation time of 700 ms and 4x4 mixtures) to investigate the influence of two parameters, the number of subbands and the number of clustering-centroids, on the performance of the proposed algorithm. Comparison with existing permutation alignment algorithms proves that the proposed algorithm can improve the robustness in challenging scenarios and can reduce block permutation errors effectively.

Reference

Experiment results

Experiment 1: BSS in low reverberation

  • 4 microphones, inter space of 4 cm
  • Reverberation time T60 = 130 ms
  • Sampling frequency fs = 8 kHz
  • The data is downloaded from http://www.kecl.ntt.co.jp/icl/signal/sawada/demo/bss2to4/index.html
Algorithms \ SIR_{out} [dB] 2x2
mix
3x3
mix
4x4
mix
Benchmark 18.80 12.87 9.81
Proposed (+local) 17.10 (18.95) 12.71 (12.74) 8.89 (9.53)
Centroid-1 (+local) 13.25 (13.01) 11.17 (12.74) 2.30 (8.92)
Centroid-M (+local) 14.49 (18.95) 12.51 (12.74) 4.13 (8.29)
RG 16.95 11.30 6.91
Murata 11.75 11.21 4.40
IVA 12.46 10.98 2.99

Experiment 2: BSS in high reverberation

  • 4 microphones, inter space of 6 cm
  • 2 examples from the 210 4x4 testing files
  • Reverberation time T60 = 450 ms
  • Sampling frequency fs = 16 kHz
Algorithms \ SIR_{out} [dB] 4x4 test1
mix
4x4 test2
mix
Benchmark 9.46 8.35
Proposed (+local) 7.93 (8.17) 8.32 (8.60)
Centroid1 (+local) 6.06 (7.30) 3.31 (4.36)
Centroid-M (+local) 7.33 (8.01) 3.25 (4.44)
RG 7.82 0.14
Murata 3.21 2.72
IVA 1.45 0.95
This page is maintained by Lin Wang
Last modification: | Created: 02/13/2015