Speech separation and extraction by combining superdirective beamforming and blind source separation

Abstract

Emulating human auditory systems, a two-stage target speech extraction method is proposed which combines fixed beamforming and blind source separation. With the target speaker remaining in the vicinity of a fixed location, several beams from a microphone array point at an area containing the target, then the beamformed output is fed to a blind source separation scheme to get the target signal. The fixed beamforming preprocessing enhances the robustness to time-varying environments and makes the target signal dominant in the beamformed output and hence easier to extract.

Reference

L. Wang, H. Ding, and F. Yin (2011): Target speech extraction in cocktail party by combining beamforming and blind source separation. Acoustics Australia, 39(2): 64-68.
L. Wang, H. Ding, and F. Yin (2011): A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures. IEEE Transactions on Audio, Speech, and Language Processing 19(3): 549-557.