Dr. Andrea Cavallaro




Multisource audio-visual production from user-generated content (MAVIP)
The pervasiveness of amateur media recorders either embedded in smartphones or as stand-alone devices is revolutionizing the way events are captured and reported. The aim of this project is to devise intelligent editing and production algorithms based on new signal processing techniques for processing multi-view user-generated content. The explosion of shared video content offers the opportunity for new ways of not only analysing but also timely reporting stories, ranging from disaster scenes and protests to music concerts and sports events. However, the large amount of data increasingly available and their varying quality makes the selection and editing of appropriate multimedia items in a timely manner very difficult thus strongly limiting the opportunity to harvest this data for security, cultural and entertainment applications. There is an urgent need to investigate and develop new ways to help or replace what used to be the role of a producer/director in this rapidly changing landscape. In particular, there is the need to automate production tasks and to generate new and high-quality content from multiple views. The key aspect of the project is the integration of audio and visual inputs that support each other in reaching objectives that would otherwise be impossible using only one modality. We will focus on a set of relevant event-types: sports, music shows and crowd scenes. We will devise novel multisource processing techniques to improve audiovisual production and to enable synchronisation processing. This will in turn allow generation of novel and higher quality audio-visual rendering of captured events. Related journal papers 1. Wang L, Hon T, Reiss J, Cavallaro A. (2016). Self-Localization of Ad-Hoc Arrays Using Time Difference of Arrivals. IEEE Transactions on Signal Processing, 64 (4), pp. 1018-1033 2. Wang L, Reiss J, Cavallaro A. (2016). Over-Determined Source Separation and Localization Using Distributed Microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (9), pp. 1573-1588 3. Wang L, Hon T, Reiss J, Cavallaro A. (2016). An Iterative Approach to Source Counting and Localization Using Two Distant Microphones. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (6), pp. 1079-1093 4. Hon T, Wang L, Reiss J, Cavallaro A. (2015). Audio Fingerprinting for Multi-Device Self-Localization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23 (10), pp. 1623-1636 5. Bano S, Cavallaro A. (2015). ViComp: composition of user-generated videos. Multimedia Tools and Applications, 75 (12), pp. 7187-7210 6. Bano S, Cavallaro A. (2015). Discovery and organization of multi-camera user-generated videos of the same event. Information Sciences, pp. 108-121 7. Chen F, De Vleeschouwer C, Cavallaro A. (2014). Resource Allocation for Personalized Video Summarization. IEEE Transactions on Multimedia, 16 (2), pp. 455-469 8. Llagostera Casanovas A, Cavallaro A. (2014). Audio-visual events for multi-camera synchronization. Multimedia Tools and Applications, 74 (4), pp. 1317-1340 9. Zini L, Odone F, Cavallaro A. (2014). Multiview Matching of Articulated Objects. IEEE Transactions on Circuits and Systems for Video Technology, 24 (11), pp. 1920-1934 Full list of publications: http://www.eecs.qmul.ac.uk/~andrea/publications.html