Sebastian Ewert

Lecturer/Researcher in Signal Processing

My research focuses on Informed Machine Listening and Semantic Music Processing. That means that my PhD students and I together develop novel methods integrating concepts from machine learning, signal processing, statistical modelling, numerical optimization and many other fields. We apply our methods to audio and music data to identify hidden, semantically meaningful structure in the raw data, which enables the development of intelligent, efficient and intuitive ways to search, re-use, explore or process audio in diverse ways. For example,

My background is in Computer Science and Mathematics. I did a PhD in Computer Science at the University of Bonn, Germany, which was supervised at the Max-Planck-Institute for Informatics, Saarbrücken, Germany. Currently, I am a lecturer in signal processing in the School of Electronic Engineering and Computer Science at Queen Mary University of London, conducting research within the Centre For Digital Music (C4DM) in the programme Fusing Audio And Semantic Technologies (FAST). Further, I am one of the academic leaders of the Machine Listening Lab at QMUL.

Contact: s.ewert [ed] qmul.ac.uk

PhD Students

Siying Wang: Improving Accuracy and Robustness of Music Alignment Against Expression Variation
Summary: The goal of music alignment is to establish links between different versions of a piece of music by mapping each position in one version to a corresponding position in another. Although alignment methods have considerably improved in accuracy in recent years, the task remains challenging. In particular, musicians interpret a musical score in a variety of ways leading to complex differences on a musical level between individual performances. Additionally, the wide range of possible acoustic conditions adds another layer of complexity to the task. Thus, even state-of-the-art methods fail in identifying a correct alignment if such differences are substantial. A first goal of this PhD... Show full Summary: The goal of music alignment is to establish links between different versions of a piece of music by mapping each position in one version to a corresponding position in another. Although alignment methods have considerably improved in accuracy in recent years, the task remains challenging. In particular, musicians interpret a musical score in a variety of ways leading to complex differences on a musical level between individual performances. Additionally, the wide range of possible acoustic conditions adds another layer of complexity to the task. Thus, even state-of-the-art methods fail in identifying a correct alignment if such differences are substantial. A first goal of this PhD is to increase the robustness for these cases by developing novel sequence models and alignment methods that can make use of specific information available in music synchronization scenarios. A first strategy is to exploit that in many scenarios not only two but multiple versions need to be aligned. By processing these jointly, we can supply the alignment process with additional examples of how a section might be interpreted or which acoustic conditions may arise. This way, we can use alignment information between two versions transitively to stabilize the alignment with a third version. Another general strategy is to rethink assumptions made in previous methods and how these might affect the alignment result. In particular, to increase the overall robustness, current methods typically assume that notes occurring simultaneously in the score are played concurrently in a performance. Musical voices such as the melody, however, are often played asynchronously to other voices, which can lead to significant local alignment errors. Therefore, this PhD develops novel methods that handle asynchronies between the melody and the accompaniment by treating the voices as separate timelines in a multi-dimensional variant of dynamic time warping (DTW). Constraining the alignment with information obtained via classical DTW, these methods measurably improve the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise.
Besides improving alignment methods, a second goal of this PhD is to make use of the improved alignment accuracy to enable a more fine grained analysis of musical expression. In particular, in collaboration with Yamaha and the School of Music at the University of Minnesota, the PhD will investigate advanced ways to analyse audio and midi performances recorded in the context of international piano e-competition contest. Hide full

Delia Fano Yela: Signal Processing and Machine Learning Methods for Noise and Interference Reduction in Studio and Live Recordings
Summary: In recent years, source separation has been a central research topic in music signal processing, with applications in stereo-to-surround up-mixing, remixing tools for DJs or producers, instrument-wise equalizing, karaoke systems, and pre-processing in music analysis tasks. This PhD focuses in particular on the application of removing interfering sound sources from studio and live recordings. In this context, most previous methods often specialize on so called stationary and semi-stationary interferences, such as simple broadband noise, feedback or reverberation. In this work, we focus on more complex, non-stationary interferences often found in recording scenarios, such as coughs, door slams or traffic noise... Show full Summary: In recent years, source separation has been a central research topic in music signal processing, with applications in stereo-to-surround up-mixing, remixing tools for DJs or producers, instrument-wise equalizing, karaoke systems, and pre-processing in music analysis tasks. This PhD focuses in particular on the application of removing interfering sound sources from studio and live recordings. In this context, most previous methods often specialize on so called stationary and semi-stationary interferences, such as simple broadband noise, feedback or reverberation. In this work, we focus on more complex, non-stationary interferences often found in recording scenarios, such as coughs, door slams or traffic noise. General purpose methods applicable in this context often employ techniques based on non-negative matrix factorization. Such methods use a dictionary of spectral templates that is computed using available training data for each interference class. A major problem here is that the training material often differs substantially in terms of spectral and temporal properties from the noise found in a given recordings, and thus such methods often fail to properly model the sound source and therefore fail to produce separation results of high or even acceptable quality. A major goal of this PhD will be to explore and develop conceptually novel interference reduction methods that go beyond dictionary-based state-of-the-art methods and yield results of high quality even in difficult scenarios. Hide full

If you are interested in doing a PhD, please contact me (informally).