Pseudo-determined blind source separation for ad-hoc microphone networks

Abstract

We propose a pseudo-determined blind source separation framework that exploits the information from a large number of microphones in an ad-hoc network to extract and enhance sound sources in a reverberant scenario. After compensating for the time offsets and sampling rate mismatch between (asynchronous) signals, we interpret as a determined M × M mixture the over-determined M × N mixture, where M > N is the number of microphones and N is the number of sources. Next, we propose a pseudodetermined mixture model that can apply an M × M independent component analysis (ICA) directly to the M-channel recordings. Moreover, we propose a reference-based permutation alignment scheme that aligns the permutation of the ICA outputs and classifies them into target channels, which contain the N sources, and nontarget channels, which contain reverberation residuals. Finally, using the signals from nontarget channels, we estimate in each target channel the power spectral density of the noise component that we suppress with a spectral postfilter. Interestingly, we also obtain late-reverberation suppression as byproduct. Experiments show that each processing block improves incrementally source separation and that the performance of the proposed pseudodetermined separation improves as the number of microphones increases.

Reference

L. Wang and A. Cavallaro (2018): Pseudo-determined blind source separation for ad-hoc microphone networks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(5): 981-994
L. Wang and S. Doclo (2016): Correlation maximization based sampling rate offset estimation for distributed microphone arrays. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(3): 571-582.
L. Wang (2014): Multi-band multi-centroid clustering based permutation alignment for frequency-domain blind speech separation. Elsevier Digital Signal Processing 31(1): 79-92.

This page shows audio demos comparing the following algorithms.

Applied to asynchronous signal directly: AsyBSS
Determined source separation: DBSS
Over-determined source separation: BFBSS, SSBSS, MOBSS, ROBSS
Post-filtering: POST, UMMSE, BENCHMARK

The experiment conditions is as below.

8 independent microphones in an ad-hoc network
Sampling rate: 16kHz
Signal duration: 20s
Reverberation time: 800ms
The audio data is downloaded from http://sisec.inria.fr/sisec-2015/2015-asynchronous-recordings-of-speech-mixtures/