Microphone-array ego-noise reduction algorithms
for auditory micro aerial vehicles

Abstract

When a micro aerial vehicle (MAV) captures sounds emitted by a ground or aerial source, its motors and propellers are much closer to the microphone(s) than the sound source, thus leading to extremely low signal-to-noise ratios (SNR), e.g., -15 dB. While microphone-array techniques have been investigated intensively, their application to MAV-based ego-noise reduction has been rarely reported in the literature. To fill this gap, we implement and compare three types of microphone-array algorithms to enhance the target sound captured by an MAV. These algorithms include a recently emerged technique, time-frequency spatial filtering, and two well-known techniques, beamforming and blind source separation. In particular, based on the observation that the target sound and the ego-noise usually have concentrated energy at sparsely isolated time-frequency bins, we propose to use the time-frequency processing approach, which formulates a spatial filter that can enhance a target direction based on local direction of arrival estimates at individual time-frequency bins. By exploiting the time-frequency sparsity of the acoustic signal, this spatial filter works robustly for sound enhancement in the presence of strong ego-noise. We analyze in details the three techniques and conduct a comparative evaluation with real-recorded MAV sounds. Experimental results show the superiority of blind source separation and time-frequency filtering in low-SNR scenarios.

Reference

L. Wang and A. Cavallaro (2017): Microphone-array ego-noise reduction for auditory micro aerial vehicles. IEEE Sensors Journal, 17(8): 2447-2455.

A. Experiment conditions

Number of microphones: 8
Sampling rate: 8kHz
Signal duration: 10s
Reverberation time: 200ms

B. Algorithms for comparison

Adaptive beamforming: Benchmark, ABF-VAD, ABF-Inc, ABF-Identity, FBF
Blind source separation: BSS, BSS-np
Time-frequency spatial filtering: TF
Note: Benchmark (assuming perfect noise correlation matrix estimation), ABF-VAD (assuming perfect VAD information), BSS-np (assuming perfect permutatioin alignment) just provide a reference on the noise reduction performance. In practice, these three algorithms are too ideal to be implementable. We thus mark them with blue fonts.

C. Experiment results

Real-recorded ego-noise + Simulated target sound

Input [dB]	Benchmark	ABF-VAD	BSS-np	TF	BSS	ABF-Inc	ABF-Identity	FBF
-30	-7.1	-18.0	-10.1	-10.6	-15.4	-18.0	-31.7	-27.1
-25	-2.1	-12.8	-5.5	-3.2	-14.0	-15.8	-26.5	-22.1
-20	-3.4	-8.3	-1.9	3.8	-8.7	-14.4	-21.2	-17.1
-15	9.0	-2.4	7.8	9.7	6.3	-12.8	-15.7	-12.1
-10	14.5	5.5	13.2	13.9	12.1	-10.1	-10.2	-7.1
-5	19.8	10.9	19.4	17.1	17.9	-6.1	-4.9	-2.1
0	25.0	14.3	20.6	19.9	18.9	-1.6	0.5	2.9
5	30.0	16.0	24.2	22.1	23.3	2.7	5.9	7.9

Real-recorded ego-noise + Real-recorded target sound

Input [dB]	Benchmark	ABF-VAD	BSS-np	TF	BSS	ABF-Inc	ABF-Identity	FBF
-30	-8.8	-17.7	-11.6	-10.9	-19.5	-23.6	-32.6	-28.0
-25	-3.4	-11.4	-5.5	-5.5	-14.7	-19.5	-27.3	-23.0
-20	2.3	-5.0	0.2	0.5	-5.4	-15.4	-21.8	-18.0
-15	8.0	2.0	4.4	6.7	-1.5	-11.3	-16.1	-13.0
-10	13.5	9.6	12.4	11.6	8.3	-7.4	-10.4	-8.0
-5	18.8	16.4	18.2	14.9	17.8	-3.1	-4.8	-3.0
0	24.0	22.0	22.6	17.3	21.4	1.6	0.5	2.0
5	29.1	27.1	26.1	20.2	24.3	6.2	5.8	7.0