Microphone-array ego-noise reduction algorithms
for auditory micro aerial vehicles

Abstract

When a micro aerial vehicle (MAV) captures sounds emitted by a ground or aerial source, its motors and propellers are much closer to the microphone(s) than the sound source, thus leading to extremely low signal-to-noise ratios (SNR), e.g., -15 dB. While microphone-array techniques have been investigated intensively, their application to MAV-based ego-noise reduction has been rarely reported in the literature. To fill this gap, we implement and compare three types of microphone-array algorithms to enhance the target sound captured by an MAV. These algorithms include a recently emerged technique, time-frequency spatial filtering, and two well-known techniques, beamforming and blind source separation. In particular, based on the observation that the target sound and the ego-noise usually have concentrated energy at sparsely isolated time-frequency bins, we propose to use the time-frequency processing approach, which formulates a spatial filter that can enhance a target direction based on local direction of arrival estimates at individual time-frequency bins. By exploiting the time-frequency sparsity of the acoustic signal, this spatial filter works robustly for sound enhancement in the presence of strong ego-noise. We analyze in details the three techniques and conduct a comparative evaluation with real-recorded MAV sounds. Experimental results show the superiority of blind source separation and time-frequency filtering in low-SNR scenarios.

Reference

A. Experiment conditions

  • Number of microphones: 8
  • Sampling rate: 8kHz
  • Signal duration: 10s
  • Reverberation time: 200ms

B. Algorithms for comparison

  • Adaptive beamforming: Benchmark, ABF-VAD, ABF-Inc, ABF-Identity, FBF
  • Blind source separation: BSS, BSS-np
  • Time-frequency spatial filtering: TF
  • Note: Benchmark (assuming perfect noise correlation matrix estimation), ABF-VAD (assuming perfect VAD information), BSS-np (assuming perfect permutatioin alignment) just provide a reference on the noise reduction performance. In practice, these three algorithms are too ideal to be implementable. We thus mark them with blue fonts.

C. Experiment results

Real-recorded ego-noise + Simulated target sound

Input [dB] Benchmark ABF-VAD BSS-np TF BSS ABF-Inc ABF-Identity FBF
-30 -7.1 -18.0 -10.1 -10.6 -15.4 -18.0 -31.7 -27.1
-25 -2.1 -12.8 -5.5 -3.2 -14.0 -15.8 -26.5 -22.1
-20 -3.4 -8.3 -1.9 3.8 -8.7 -14.4 -21.2 -17.1
-15 9.0 -2.4 7.8 9.7 6.3 -12.8 -15.7 -12.1
-10 14.5 5.5 13.2 13.9 12.1 -10.1 -10.2 -7.1
-5 19.8 10.9 19.4 17.1 17.9 -6.1 -4.9 -2.1
0 25.0 14.3 20.6 19.9 18.9 -1.6 0.5 2.9
5 30.0 16.0 24.2 22.1 23.3 2.7 5.9 7.9


Real-recorded ego-noise + Real-recorded target sound

Input [dB] Benchmark ABF-VAD BSS-np TF BSS ABF-Inc ABF-Identity FBF
-30 -8.8 -17.7 -11.6 -10.9 -19.5 -23.6 -32.6 -28.0
-25 -3.4 -11.4 -5.5 -5.5 -14.7 -19.5 -27.3 -23.0
-20 2.3 -5.0 0.2 0.5 -5.4 -15.4 -21.8 -18.0
-15 8.0 2.0 4.4 6.7 -1.5 -11.3 -16.1 -13.0
-10 13.5 9.6 12.4 11.6 8.3 -7.4 -10.4 -8.0
-5 18.8 16.4 18.2 14.9 17.8 -3.1 -4.8 -3.0
0 24.0 22.0 22.6 17.3 21.4 1.6 0.5 2.0
5 29.1 27.1 26.1 20.2 24.3 6.2 5.8 7.0
This page is maintained by Lin Wang
Last modification: 10/21/2016 19:50:17