at.ofai.music.beatroot

Class AudioProcessor

public class AudioProcessor extends Object

Audio processing class (adapted from PerformanceMatcher).
Field Summary
protected StringaudioFileName
Source of input data.
protected AudioFormataudioFormat
Format of the audio data in pcmInputStream
protected SourceDataLineaudioOut
Line for audio output (not used, since output is done by AudioPlayer)
static booleanbatchMode
Flag for batch mode.
protected intcbIndex
The index of the next position to write in the circular buffer.
protected intchannels
Number of channels of audio in audioFormat
protected double[]circBuffer
Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).
static booleandebug
Flag for enabling or disabling debugging output
static booleandoOnsetPlot
Flag for plotting onset detection function.
protected double[]energy
The RMS energy of all frames.
static intenergyOversampleFactor
Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size
protected intfftSize
The size of an FFT frame in samples (see fftTime)
protected doublefftTime
The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime.
protected intframeCount
The number of overlapping frames of audio data which have been read.
protected doubleframeRMS
RMS amplitude of the current frame.
protected double[][]frames
The magnitude spectra of all frames, used for plotting the spectrogram.
protected int[]freqMap
A mapping function for mapping FFT bins to final frequency bins.
protected intfreqMapSize
The number of entries in freqMap.
protected inthopSize
Spacing of audio frames in samples (see hopTime)
protected doublehopTime
Spacing of audio frames (determines the amount of overlap or skip between frames).
protected double[]imBuffer
The imaginary part of the data for the in-place FFT computation.
protected byte[]inputBuffer
Audio data is initially read in PCM format into this buffer.
static intliveInputBufferSize
Audio buffer for live input.
protected doubleltAverage
Long term average frame energy (in frequency domain representation).
static intMAX_LENGTH
Maximum file length in seconds.
protected double[]newFrame
The magnitude spectrum of the current frame.
static intnormaliseMode
Determines method of normalisation.
protected EventListonsetList
The estimated onset times and their saliences.
protected double[]onsets
The estimated onset times from peak-picking the onset detection function(s).
protected AudioInputStreampcmInputStream
Uncompressed version of rawInputStream.
protected double[]phaseDeviation
Phase deviation onset detection function, indexed by frame.
Plotplot
Object for plotting output (for debugging / development) .
protected double[]prevFrame
The magnitude spectrum of the most recent frame.
protected double[]prevPhase
Phase of the previous frame, for calculating an onset function based on spectral phase deviation.
protected double[]prevPrevPhase
Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.
protected ProgressIndicatorprogressCallback
GUI component which shows progress of audio processing.
static doublerangeThreshold
For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.
protected AudioInputStreamrawInputStream
Input data stream for this performance (possibly in compressed format)
protected double[]reBuffer
The real part of the data for the in-place FFT computation.
protected floatsampleRate
Sample rate of audio in audioFormat
static doublesilenceThreshold
RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.
protected static booleansilent
Flag for suppressing all standard output messages except results.
protected double[]spectralFlux
Spectral flux onset detection function, indexed by frame.
BufferedReaderstdIn
Standard input for interactive prompts (for debugging).
protected inttotalFrames
Total number of audio frames if known, or -1 for live or compressed input.
protected double[]window
The window function for the STFT, currently a Hamming window.
Constructor Summary
AudioProcessor()
Constructor: note that streams are not opened until the input file is set (see setInputFile()).
Method Summary
voidcloseStreams()
Closes the input stream(s) associated with this object.
static double[]getFeatures(String fileName)
Reads a text file containing a list of whitespace-separated feature values.
booleangetFrame()
Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.
protected voidinit()
Allocates memory for arrays, based on parameter settings
protected voidmakeFreqMap(int fftSize, float sampleRate)
Creates a map of FFT frequency bins to comparison bins.
voidprint()
For debugging, outputs information about the AudioProcessor to standard error.
voidprocessFeatures(String fileName, double hopTime)
Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.
voidprocessFile()
Processes a complete file of audio data.
protected voidprocessFrame()
Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.
StringreadLine()
For interactive pause - wait for user to hit Enter
voidsetDisplay(BeatTrackDisplay btd)
Copies output of audio processing to the display panel.
voidsetInputFile(String fileName)
Sets up the streams and buffers for audio file input.
voidsetLiveInput()
Sets up the streams and buffers for live audio input (CD quality).
voidsetProgressCallback(ProgressIndicator c)
Adds a link to the GUI component which shows the progress of matching.
StringtoString()
Gives some basic information about the audio being processed.
protected voidweightedPhaseDeviation()
Calculates the weighted phase deviation onset detection function.

Field Detail

audioFileName

protected String audioFileName
Source of input data. Could be extended to include live input from the sound card.

audioFormat

protected AudioFormat audioFormat
Format of the audio data in pcmInputStream

audioOut

protected SourceDataLine audioOut
Line for audio output (not used, since output is done by AudioPlayer)

batchMode

public static boolean batchMode
Flag for batch mode.

cbIndex

protected int cbIndex
The index of the next position to write in the circular buffer.

channels

protected int channels
Number of channels of audio in audioFormat

circBuffer

protected double[] circBuffer
Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).

debug

public static boolean debug
Flag for enabling or disabling debugging output

doOnsetPlot

public static boolean doOnsetPlot
Flag for plotting onset detection function.

energy

protected double[] energy
The RMS energy of all frames.

energyOversampleFactor

public static int energyOversampleFactor
Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size

fftSize

protected int fftSize
The size of an FFT frame in samples (see fftTime)

fftTime

protected double fftTime
The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime. (Default = 0.04644s). The value is adjusted so that fftSize is always power of 2.

frameCount

protected int frameCount
The number of overlapping frames of audio data which have been read.

frameRMS

protected double frameRMS
RMS amplitude of the current frame.

frames

protected double[][] frames
The magnitude spectra of all frames, used for plotting the spectrogram.

freqMap

protected int[] freqMap
A mapping function for mapping FFT bins to final frequency bins. The mapping is linear (1-1) until the resolution reaches 2 points per semitone, then logarithmic with a semitone resolution. e.g. for 44.1kHz sampling rate and fftSize of 2048 (46ms), bin spacing is 21.5Hz, which is mapped linearly for bins 0-34 (0 to 732Hz), and logarithmically for the remaining bins (midi notes 79 to 127, bins 35 to 83), where all energy above note 127 is mapped into the final bin.

freqMapSize

protected int freqMapSize
The number of entries in freqMap. Note that the length of the array is greater, because its size is not known at creation time.

hopSize

protected int hopSize
Spacing of audio frames in samples (see hopTime)

hopTime

protected double hopTime
Spacing of audio frames (determines the amount of overlap or skip between frames). This value is expressed in seconds and can be set by the command line option -h hopTime. (Default = 0.020s)

imBuffer

protected double[] imBuffer
The imaginary part of the data for the in-place FFT computation. Since input data is real, this initially contains zeros.

inputBuffer

protected byte[] inputBuffer
Audio data is initially read in PCM format into this buffer.

liveInputBufferSize

public static final int liveInputBufferSize
Audio buffer for live input. (Not used yet)

ltAverage

protected double ltAverage
Long term average frame energy (in frequency domain representation).

MAX_LENGTH

public static final int MAX_LENGTH
Maximum file length in seconds. Used for static allocation of arrays.

newFrame

protected double[] newFrame
The magnitude spectrum of the current frame.

normaliseMode

public static int normaliseMode
Determines method of normalisation. Values can be:

onsetList

protected EventList onsetList
The estimated onset times and their saliences.

onsets

protected double[] onsets
The estimated onset times from peak-picking the onset detection function(s).

pcmInputStream

protected AudioInputStream pcmInputStream
Uncompressed version of rawInputStream. In the (normal) case where the input is already PCM data, rawInputStream == pcmInputStream

phaseDeviation

protected double[] phaseDeviation
Phase deviation onset detection function, indexed by frame.

plot

Plot plot
Object for plotting output (for debugging / development) .

prevFrame

protected double[] prevFrame
The magnitude spectrum of the most recent frame. Used for calculating the spectral flux.

prevPhase

protected double[] prevPhase
Phase of the previous frame, for calculating an onset function based on spectral phase deviation.

prevPrevPhase

protected double[] prevPrevPhase
Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.

progressCallback

protected ProgressIndicator progressCallback
GUI component which shows progress of audio processing.

rangeThreshold

public static double rangeThreshold
For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.

rawInputStream

protected AudioInputStream rawInputStream
Input data stream for this performance (possibly in compressed format)

reBuffer

protected double[] reBuffer
The real part of the data for the in-place FFT computation. Since input data is real, this initially contains the input data.

sampleRate

protected float sampleRate
Sample rate of audio in audioFormat

silenceThreshold

public static double silenceThreshold
RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.

silent

protected static boolean silent
Flag for suppressing all standard output messages except results.

spectralFlux

protected double[] spectralFlux
Spectral flux onset detection function, indexed by frame.

stdIn

BufferedReader stdIn
Standard input for interactive prompts (for debugging).

totalFrames

protected int totalFrames
Total number of audio frames if known, or -1 for live or compressed input.

window

protected double[] window
The window function for the STFT, currently a Hamming window.

Constructor Detail

AudioProcessor

public AudioProcessor()
Constructor: note that streams are not opened until the input file is set (see setInputFile()).

Method Detail

closeStreams

public void closeStreams()
Closes the input stream(s) associated with this object.

getFeatures

public static double[] getFeatures(String fileName)
Reads a text file containing a list of whitespace-separated feature values. Created for paper submitted to ICASSP'07.

Parameters: fileName File containing the data

Returns: An array containing the feature values

getFrame

public boolean getFrame()
Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.

Returns: true if a frame (or part of a frame, if it is the final frame) is read. If a complete frame cannot be read, the InputStream is set to null.

init

protected void init()
Allocates memory for arrays, based on parameter settings

makeFreqMap

protected void makeFreqMap(int fftSize, float sampleRate)
Creates a map of FFT frequency bins to comparison bins. Where the spacing of FFT bins is less than 0.5 semitones, the mapping is one to one. Where the spacing is greater than 0.5 semitones, the FFT energy is mapped into semitone-wide bins. No scaling is performed; that is the energy is summed into the comparison bins. See also processFrame()

print

public void print()
For debugging, outputs information about the AudioProcessor to standard error.

processFeatures

public void processFeatures(String fileName, double hopTime)
Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.

Parameters: fileName The file of feature values hopTime The spacing of feature values in time

processFile

public void processFile()
Processes a complete file of audio data.

processFrame

protected void processFrame()
Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.

readLine

public String readLine()
For interactive pause - wait for user to hit Enter

setDisplay

public void setDisplay(BeatTrackDisplay btd)
Copies output of audio processing to the display panel.

setInputFile

public void setInputFile(String fileName)
Sets up the streams and buffers for audio file input. If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.

Parameters: fileName The path name of the input audio file.

setLiveInput

public void setLiveInput()
Sets up the streams and buffers for live audio input (CD quality). If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.

setProgressCallback

public void setProgressCallback(ProgressIndicator c)
Adds a link to the GUI component which shows the progress of matching.

Parameters: c the AudioProcessor representing the other performance

toString

public String toString()
Gives some basic information about the audio being processed.

weightedPhaseDeviation

protected void weightedPhaseDeviation()
Calculates the weighted phase deviation onset detection function. Not used. TODO: Test the change to WPD fn