This
page distributes two datasets used in the paper Audio-Visual Events for Multi-Camera Synchronization (under
review).
Abstract. We present a method for the automatic synchronization of
audio-visual recordings captured with a set of independent cameras. The
proposed method jointly processes data from the audio and video channels
to estimate the inter-camera delays that are then used to temporally
align the recordings. Our approach is based on three main steps. First we
extract from each recording temporally sharp audio-visual events.
Audio-visual events have a short duration and are defined by an audio
onset happening jointly to local movement in the field of view. Then, we estimate inter-camera
delays by assessing the co-occurrence of events in the various
recordings. Finally, we use a cross-validation procedure to combine the
results for all camera pairs and to align the recordings on a global
timeline. An important feature of the proposed method is the estimation
of the confidence level on the results that allows us to automatically
reject recordings that are not reliable for the alignment.
Results show that our method outperforms state-of-the-art approaches
based on audio-only or video-only analysis with both fixed and hand-held
moving cameras.
Download
the datasets
|