Microphone-array ego-noise reduction algorithms
for auditory micro aerial vehicles
Abstract
When a micro aerial vehicle (MAV) captures sounds emitted by a ground or aerial source, its motors and propellers are much closer to the microphone(s) than the sound source, thus leading to extremely low signal-to-noise ratios (SNR), e.g., -15 dB. While microphone-array techniques have been investigated intensively, their application to MAV-based ego-noise reduction has been rarely reported in the literature. To fill this gap, we implement and compare three types of microphone-array algorithms to enhance the target sound captured by an MAV. These algorithms include a recently emerged technique, time-frequency spatial filtering, and two well-known techniques, beamforming and blind source separation. In particular, based on the observation that the target sound and the ego-noise usually have concentrated energy at sparsely isolated time-frequency bins, we propose to use the time-frequency processing approach, which formulates a spatial filter that can enhance a target direction based on local direction of arrival estimates at individual time-frequency bins. By exploiting the time-frequency sparsity of the acoustic signal, this spatial filter works robustly for sound enhancement in the presence of strong ego-noise. We analyze in details the three techniques and conduct a comparative evaluation with real-recorded MAV sounds. Experimental results show the superiority of blind source separation and time-frequency filtering in low-SNR scenarios.
Reference
- L. Wang and A. Cavallaro (2017): Microphone-array ego-noise reduction for auditory micro aerial vehicles. IEEE Sensors Journal, 17(8): 2447-2455.
A. Experiment conditions
- Number of microphones: 8
- Sampling rate: 8kHz
- Signal duration: 10s
- Reverberation time: 200ms
B. Algorithms for comparison
- Adaptive beamforming: Benchmark, ABF-VAD, ABF-Inc, ABF-Identity, FBF
- Blind source separation: BSS, BSS-np
- Time-frequency spatial filtering: TF
- Note: Benchmark (assuming perfect noise correlation matrix estimation), ABF-VAD (assuming perfect VAD information), BSS-np (assuming perfect permutatioin alignment) just provide a reference on the noise reduction performance. In practice, these three algorithms are too ideal to be implementable. We thus mark them with blue fonts.
C. Experiment results
Real-recorded ego-noise + Simulated target sound
Input [dB] | Benchmark | ABF-VAD | BSS-np | TF | BSS | ABF-Inc | ABF-Identity | FBF |
---|---|---|---|---|---|---|---|---|
-30 | -7.1 | -18.0 | -10.1 | -10.6 | -15.4 | -18.0 | -31.7 | -27.1 |
-25 | -2.1 | -12.8 | -5.5 | -3.2 | -14.0 | -15.8 | -26.5 | -22.1 |
-20 | -3.4 | -8.3 | -1.9 | 3.8 | -8.7 | -14.4 | -21.2 | -17.1 |
-15 | 9.0 | -2.4 | 7.8 | 9.7 | 6.3 | -12.8 | -15.7 | -12.1 |
-10 | 14.5 | 5.5 | 13.2 | 13.9 | 12.1 | -10.1 | -10.2 | -7.1 |
-5 | 19.8 | 10.9 | 19.4 | 17.1 | 17.9 | -6.1 | -4.9 | -2.1 |
0 | 25.0 | 14.3 | 20.6 | 19.9 | 18.9 | -1.6 | 0.5 | 2.9 |
5 | 30.0 | 16.0 | 24.2 | 22.1 | 23.3 | 2.7 | 5.9 | 7.9 |
Real-recorded ego-noise + Real-recorded target sound
Input [dB] | Benchmark | ABF-VAD | BSS-np | TF | BSS | ABF-Inc | ABF-Identity | FBF |
---|---|---|---|---|---|---|---|---|
-30 | -8.8 | -17.7 | -11.6 | -10.9 | -19.5 | -23.6 | -32.6 | -28.0 |
-25 | -3.4 | -11.4 | -5.5 | -5.5 | -14.7 | -19.5 | -27.3 | -23.0 |
-20 | 2.3 | -5.0 | 0.2 | 0.5 | -5.4 | -15.4 | -21.8 | -18.0 |
-15 | 8.0 | 2.0 | 4.4 | 6.7 | -1.5 | -11.3 | -16.1 | -13.0 |
-10 | 13.5 | 9.6 | 12.4 | 11.6 | 8.3 | -7.4 | -10.4 | -8.0 |
-5 | 18.8 | 16.4 | 18.2 | 14.9 | 17.8 | -3.1 | -4.8 | -3.0 |
0 | 24.0 | 22.0 | 22.6 | 17.3 | 21.4 | 1.6 | 0.5 | 2.0 |
5 | 29.1 | 27.1 | 26.1 | 20.2 | 24.3 | 6.2 | 5.8 | 7.0 |
Last modification: 10/21/2016 19:50:17