at.ofai.music.beatroot
Class AudioProcessor

java.lang.Object
  extended by at.ofai.music.beatroot.AudioProcessor

public class AudioProcessor
extends java.lang.Object

Audio processing class (adapted from PerformanceMatcher).


Field Summary
protected  java.lang.String audioFileName
          Source of input data.
protected  javax.sound.sampled.AudioFormat audioFormat
          Format of the audio data in pcmInputStream
protected  javax.sound.sampled.SourceDataLine audioOut
          Line for audio output (not used, since output is done by AudioPlayer)
static boolean batchMode
          Flag for batch mode.
protected  int cbIndex
          The index of the next position to write in the circular buffer.
protected  int channels
          Number of channels of audio in audioFormat
protected  double[] circBuffer
          Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).
static boolean debug
          Flag for enabling or disabling debugging output
static boolean doOnsetPlot
          Flag for plotting onset detection function.
protected  double[] energy
          The RMS energy of all frames.
static int energyOversampleFactor
          Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size
protected  int fftSize
          The size of an FFT frame in samples (see fftTime)
protected  double fftTime
          The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime.
protected  int frameCount
          The number of overlapping frames of audio data which have been read.
protected  double frameRMS
          RMS amplitude of the current frame.
protected  double[][] frames
          The magnitude spectra of all frames, used for plotting the spectrogram.
protected  int[] freqMap
          A mapping function for mapping FFT bins to final frequency bins.
protected  int freqMapSize
          The number of entries in freqMap.
protected  int hopSize
          Spacing of audio frames in samples (see hopTime)
protected  double hopTime
          Spacing of audio frames (determines the amount of overlap or skip between frames).
protected  double[] imBuffer
          The imaginary part of the data for the in-place FFT computation.
protected  byte[] inputBuffer
          Audio data is initially read in PCM format into this buffer.
static int liveInputBufferSize
          Audio buffer for live input.
protected  double ltAverage
          Long term average frame energy (in frequency domain representation).
static int MAX_LENGTH
          Maximum file length in seconds.
protected  double[] newFrame
          The magnitude spectrum of the current frame.
static int normaliseMode
          Determines method of normalisation.
protected  at.ofai.music.util.EventList onsetList
          The estimated onset times and their saliences.
protected  double[] onsets
          The estimated onset times from peak-picking the onset detection function(s).
protected  javax.sound.sampled.AudioInputStream pcmInputStream
          Uncompressed version of rawInputStream.
protected  double[] phaseDeviation
          Phase deviation onset detection function, indexed by frame.
(package private)  at.ofai.music.worm.Plot plot
          Object for plotting output (for debugging / development) .
protected  double[] prevFrame
          The magnitude spectrum of the most recent frame.
protected  double[] prevPhase
          Phase of the previous frame, for calculating an onset function based on spectral phase deviation.
protected  double[] prevPrevPhase
          Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.
protected  ProgressIndicator progressCallback
          GUI component which shows progress of audio processing.
static double rangeThreshold
          For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.
protected  javax.sound.sampled.AudioInputStream rawInputStream
          Input data stream for this performance (possibly in compressed format)
protected  double[] reBuffer
          The real part of the data for the in-place FFT computation.
protected  float sampleRate
          Sample rate of audio in audioFormat
static double silenceThreshold
          RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.
protected static boolean silent
          Flag for suppressing all standard output messages except results.
protected  double[] spectralFlux
          Spectral flux onset detection function, indexed by frame.
(package private)  java.io.BufferedReader stdIn
          Standard input for interactive prompts (for debugging).
protected  int totalFrames
          Total number of audio frames if known, or -1 for live or compressed input.
protected  double[] window
          The window function for the STFT, currently a Hamming window.
 
Constructor Summary
AudioProcessor()
          Constructor: note that streams are not opened until the input file is set (see setInputFile()).
 
Method Summary
 void closeStreams()
          Closes the input stream(s) associated with this object.
static double[] getFeatures(java.lang.String fileName)
          Reads a text file containing a list of whitespace-separated feature values.
 boolean getFrame()
          Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.
protected  void init()
          Allocates memory for arrays, based on parameter settings
protected  void makeFreqMap(int fftSize, float sampleRate)
          Creates a map of FFT frequency bins to comparison bins.
 void print()
          For debugging, outputs information about the AudioProcessor to standard error.
 void processFeatures(java.lang.String fileName, double hopTime)
          Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.
 void processFile()
          Processes a complete file of audio data.
protected  void processFrame()
          Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.
 java.lang.String readLine()
          For interactive pause - wait for user to hit Enter
 void setDisplay(BeatTrackDisplay btd)
          Copies output of audio processing to the display panel.
 void setInputFile(java.lang.String fileName)
          Sets up the streams and buffers for audio file input.
 void setLiveInput()
          Sets up the streams and buffers for live audio input (CD quality).
 void setProgressCallback(ProgressIndicator c)
          Adds a link to the GUI component which shows the progress of matching.
 java.lang.String toString()
          Gives some basic information about the audio being processed.
protected  void weightedPhaseDeviation()
          Calculates the weighted phase deviation onset detection function.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

rawInputStream

protected javax.sound.sampled.AudioInputStream rawInputStream
Input data stream for this performance (possibly in compressed format)


pcmInputStream

protected javax.sound.sampled.AudioInputStream pcmInputStream
Uncompressed version of rawInputStream. In the (normal) case where the input is already PCM data, rawInputStream == pcmInputStream


audioOut

protected javax.sound.sampled.SourceDataLine audioOut
Line for audio output (not used, since output is done by AudioPlayer)


audioFormat

protected javax.sound.sampled.AudioFormat audioFormat
Format of the audio data in pcmInputStream


channels

protected int channels
Number of channels of audio in audioFormat


sampleRate

protected float sampleRate
Sample rate of audio in audioFormat


audioFileName

protected java.lang.String audioFileName
Source of input data. Could be extended to include live input from the sound card.


hopTime

protected double hopTime
Spacing of audio frames (determines the amount of overlap or skip between frames). This value is expressed in seconds and can be set by the command line option -h hopTime. (Default = 0.020s)


fftTime

protected double fftTime
The approximate size of an FFT frame in seconds, as set by the command line option -f FFTTime. (Default = 0.04644s). The value is adjusted so that fftSize is always power of 2.


hopSize

protected int hopSize
Spacing of audio frames in samples (see hopTime)


fftSize

protected int fftSize
The size of an FFT frame in samples (see fftTime)


frameCount

protected int frameCount
The number of overlapping frames of audio data which have been read.


frameRMS

protected double frameRMS
RMS amplitude of the current frame.


ltAverage

protected double ltAverage
Long term average frame energy (in frequency domain representation).


inputBuffer

protected byte[] inputBuffer
Audio data is initially read in PCM format into this buffer.


circBuffer

protected double[] circBuffer
Audio data is scaled to the range [0,1] and averaged to one channel and stored in a circular buffer for reuse (if hopTime < fftTime).


cbIndex

protected int cbIndex
The index of the next position to write in the circular buffer.


window

protected double[] window
The window function for the STFT, currently a Hamming window.


reBuffer

protected double[] reBuffer
The real part of the data for the in-place FFT computation. Since input data is real, this initially contains the input data.


imBuffer

protected double[] imBuffer
The imaginary part of the data for the in-place FFT computation. Since input data is real, this initially contains zeros.


prevPhase

protected double[] prevPhase
Phase of the previous frame, for calculating an onset function based on spectral phase deviation.


prevPrevPhase

protected double[] prevPrevPhase
Phase of the frame before the previous frame, for calculating an onset function based on spectral phase deviation.


phaseDeviation

protected double[] phaseDeviation
Phase deviation onset detection function, indexed by frame.


spectralFlux

protected double[] spectralFlux
Spectral flux onset detection function, indexed by frame.


freqMap

protected int[] freqMap
A mapping function for mapping FFT bins to final frequency bins. The mapping is linear (1-1) until the resolution reaches 2 points per semitone, then logarithmic with a semitone resolution. e.g. for 44.1kHz sampling rate and fftSize of 2048 (46ms), bin spacing is 21.5Hz, which is mapped linearly for bins 0-34 (0 to 732Hz), and logarithmically for the remaining bins (midi notes 79 to 127, bins 35 to 83), where all energy above note 127 is mapped into the final bin.


freqMapSize

protected int freqMapSize
The number of entries in freqMap. Note that the length of the array is greater, because its size is not known at creation time.


prevFrame

protected double[] prevFrame
The magnitude spectrum of the most recent frame. Used for calculating the spectral flux.


newFrame

protected double[] newFrame
The magnitude spectrum of the current frame.


frames

protected double[][] frames
The magnitude spectra of all frames, used for plotting the spectrogram.


energy

protected double[] energy
The RMS energy of all frames.


onsets

protected double[] onsets
The estimated onset times from peak-picking the onset detection function(s).


onsetList

protected at.ofai.music.util.EventList onsetList
The estimated onset times and their saliences.


progressCallback

protected ProgressIndicator progressCallback
GUI component which shows progress of audio processing.


totalFrames

protected int totalFrames
Total number of audio frames if known, or -1 for live or compressed input.


stdIn

java.io.BufferedReader stdIn
Standard input for interactive prompts (for debugging).


plot

at.ofai.music.worm.Plot plot
Object for plotting output (for debugging / development) .


debug

public static boolean debug
Flag for enabling or disabling debugging output


doOnsetPlot

public static boolean doOnsetPlot
Flag for plotting onset detection function.


silent

protected static boolean silent
Flag for suppressing all standard output messages except results.


batchMode

public static boolean batchMode
Flag for batch mode.


silenceThreshold

public static double silenceThreshold
RMS frame energy below this value results in the frame being set to zero, so that normalisation does not have undesired side-effects.


rangeThreshold

public static double rangeThreshold
For dynamic range compression, this value is added to the log magnitude in each frequency bin and any remaining negative values are then set to zero.


normaliseMode

public static int normaliseMode
Determines method of normalisation. Values can be:


energyOversampleFactor

public static int energyOversampleFactor
Ratio between rate of sampling the signal energy (for the amplitude envelope) and the hop size


liveInputBufferSize

public static final int liveInputBufferSize
Audio buffer for live input. (Not used yet)

See Also:
Constant Field Values

MAX_LENGTH

public static final int MAX_LENGTH
Maximum file length in seconds. Used for static allocation of arrays.

See Also:
Constant Field Values
Constructor Detail

AudioProcessor

public AudioProcessor()
Constructor: note that streams are not opened until the input file is set (see setInputFile()).

Method Detail

print

public void print()
For debugging, outputs information about the AudioProcessor to standard error.


readLine

public java.lang.String readLine()
For interactive pause - wait for user to hit Enter


toString

public java.lang.String toString()
Gives some basic information about the audio being processed.

Overrides:
toString in class java.lang.Object

setProgressCallback

public void setProgressCallback(ProgressIndicator c)
Adds a link to the GUI component which shows the progress of matching.

Parameters:
c - the AudioProcessor representing the other performance

setLiveInput

public void setLiveInput()
Sets up the streams and buffers for live audio input (CD quality). If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.


setInputFile

public void setInputFile(java.lang.String fileName)
Sets up the streams and buffers for audio file input. If any Exception is thrown within this method, it is caught, and any opened streams are closed, and pcmInputStream is set to null, indicating that the method did not complete successfully.

Parameters:
fileName - The path name of the input audio file.

init

protected void init()
Allocates memory for arrays, based on parameter settings


closeStreams

public void closeStreams()
Closes the input stream(s) associated with this object.


makeFreqMap

protected void makeFreqMap(int fftSize,
                           float sampleRate)
Creates a map of FFT frequency bins to comparison bins. Where the spacing of FFT bins is less than 0.5 semitones, the mapping is one to one. Where the spacing is greater than 0.5 semitones, the FFT energy is mapped into semitone-wide bins. No scaling is performed; that is the energy is summed into the comparison bins. See also processFrame()


weightedPhaseDeviation

protected void weightedPhaseDeviation()
Calculates the weighted phase deviation onset detection function. Not used. TODO: Test the change to WPD fn


getFrame

public boolean getFrame()
Reads a frame of input data, averages the channels to mono, scales to a maximum possible absolute value of 1, and stores the audio data in a circular input buffer.

Returns:
true if a frame (or part of a frame, if it is the final frame) is read. If a complete frame cannot be read, the InputStream is set to null.

processFrame

protected void processFrame()
Processes a frame of audio data by first computing the STFT with a Hamming window, then mapping the frequency bins into a part-linear part-logarithmic array, then computing the spectral flux then (optionally) normalising and calculating onsets.


processFile

public void processFile()
Processes a complete file of audio data.


getFeatures

public static double[] getFeatures(java.lang.String fileName)
Reads a text file containing a list of whitespace-separated feature values. Created for paper submitted to ICASSP'07.

Parameters:
fileName - File containing the data
Returns:
An array containing the feature values

processFeatures

public void processFeatures(java.lang.String fileName,
                            double hopTime)
Reads a file of feature values, treated as an onset detection function, and finds peaks, which are stored in onsetList and onsets.

Parameters:
fileName - The file of feature values
hopTime - The spacing of feature values in time

setDisplay

public void setDisplay(BeatTrackDisplay btd)
Copies output of audio processing to the display panel.