Sebastian Ewert

Lecturer/Researcher in Signal Processing

I am a lecturer (≈assistant professor) for Signal Processing at Queen Mary University of London.

In our research, my PhD students and I focus on Machine Listening and Semantic Music Processing. That means we develop novel methods combining machine learning, signal processing, statistical modelling, numerical optimization and many other fields. We apply our methods to audio, music and multimedia data to identify hidden, semantically meaningful structure in the raw data - this enables the development of intelligent, efficient and intuitive ways to search, re-use, explore or process audio in new ways. For example,

My background is in Computer Science and Mathematics. I did a PhD in Computer Science at the University of Bonn, Germany, which was supervised at the Max-Planck-Institute for Informatics, Saarbrücken, Germany. Currently, I am a lecturer of signal processing in the School of Electronic Engineering and Computer Science at Queen Mary University of London, conducting research within the Centre For Digital Music (C4DM, Intro Video) in the programme Fusing Audio And Semantic Technologies (FAST). Further, I am one of the academic leaders of the Machine Listening Lab at QMUL.

Contact: s.ewert [ed] qmul.ac.uk

PhD Students

Daniel Stoller: Machine Listening with Limited Annotations
Summary: How can we build machine listening systems that learn concepts based on examples with few, weak, or differently structured labels? Recent deep learning methods typically require large, precisely annotated datasets to generalise well. Models for smaller datasets are often regularised by capacity constraints or problem-specific assumptions, both limiting the potential performance. However, humans learn complex concepts based on only few explicitly labelled examples, arguably through structures acquired during extensive unsupervised exposure to the environment. Therefore, we investigate semi-supervised generative models, which allow us to flexibly incorporate prior assumptions about the data generation process to help generalisation... Show full Summary: How can we build machine listening systems that learn concepts based on examples with few, weak, or differently structured labels? Recent deep learning methods typically require large, precisely annotated datasets to generalise well. Models for smaller datasets are often regularised by capacity constraints or problem-specific assumptions, both limiting the potential performance. However, humans learn complex concepts based on only few explicitly labelled examples, arguably through structures acquired during extensive unsupervised exposure to the environment. Therefore, we investigate semi-supervised generative models, which allow us to flexibly incorporate prior assumptions about the data generation process to help generalisation. Classification can then be expressed as inferring the latent structure given the data. To leverage more of the available data, we also apply multi-task learning to integrate information from related annotations. We will demonstrate these techniques in the field of music information retrieval: A combined singing voice separation and detection model will be developed to exploit the dependencies between these tasks and that benefits from prior information at test time. We deal with the problem of weak labels in the context of a lyrics alignment system, integrating annotations at the phoneme, word, and phrase level. Finally, we combine the above systems for a unified model of the singing voice that performs detection, separation, and transcription flexibly depending on the available data. Hide full

Delia Fano Yela: Signal Processing and Machine Learning Methods for Source Separation in Music Production
Summary: In recent years, source separation has been a central research topic in music signal processing, with applications in stereo-to-surround up-mixing, remixing tools for DJs or producers, instrument-wise equalizing, karaoke systems, and pre-processing in music analysis tasks. This PhD focuses on various applications of source separation technique in the music production process, from removing interfering sound sources from studio and live recordings to tools for modifying the singing voice. In this context, most previous methods often specialize on so called stationary and semi-stationary interferences, such as simple broadband noise, feedback or reverberation. In practise, however, one often faces a variety of complex, non-stationary interferences, such as coughs, door slams or traffic noise... Show full Summary: In recent years, source separation has been a central research topic in music signal processing, with applications in stereo-to-surround up-mixing, remixing tools for DJs or producers, instrument-wise equalizing, karaoke systems, and pre-processing in music analysis tasks. This PhD focuses on various applications of source separation technique in the music production process, from removing interfering sound sources from studio and live recordings to tools for modifying the singing voice. In this context, most previous methods often specialize on so called stationary and semi-stationary interferences, such as simple broadband noise, feedback or reverberation. In practise, however, one often faces a variety of complex, non-stationary interferences, such as coughs, door slams or traffic noise. General purpose methods applicable in this context often employ techniques based on non-negative matrix factorization. Such methods use a dictionary of spectral templates that is computed using available training data for each interference class. A major problem here is that the training material often differs substantially in terms of spectral and temporal properties from the noise found in a given recordings, and thus such methods often fail to properly model the sound source and therefore fail to produce separation results of high or even acceptable quality. A major goal of this PhD will be to explore and develop conceptually novel source separation methods that go beyond dictionary-based state-of-the-art methods and yield results of high quality even in difficult scenarios. Hide full

Siying Wang: Computational Methods for the Alignment and Score-Informed Transcription of Piano Music
Summary: The goal of music alignment is to establish links between different versions of a piece of music by mapping each position in one version to a corresponding position in another. Although alignment methods have considerably improved in accuracy in recent years, the task remains challenging. In particular, musicians interpret a musical score in a variety of ways leading to complex differences on a musical level between individual performances. Additionally, the wide range of possible acoustic conditions adds another layer of complexity to the task. Thus, even state-of-the-art methods fail in identifying a correct alignment if such differences are substantial. A first goal of this PhD... Show full Summary: The goal of music alignment is to establish links between different versions of a piece of music by mapping each position in one version to a corresponding position in another. Although alignment methods have considerably improved in accuracy in recent years, the task remains challenging. In particular, musicians interpret a musical score in a variety of ways leading to complex differences on a musical level between individual performances. Additionally, the wide range of possible acoustic conditions adds another layer of complexity to the task. Thus, even state-of-the-art methods fail in identifying a correct alignment if such differences are substantial. A first goal of this PhD is to increase the robustness for these cases by developing novel sequence models and alignment methods that can make use of specific information available in music synchronization scenarios. A first strategy is to exploit that in many scenarios not only two but multiple versions need to be aligned. By processing these jointly, we can supply the alignment process with additional examples of how a section might be interpreted or which acoustic conditions may arise. This way, we can use alignment information between two versions transitively to stabilize the alignment with a third version. Another general strategy is to rethink assumptions made in previous methods and how these might affect the alignment result. In particular, to increase the overall robustness, current methods typically assume that notes occurring simultaneously in the score are played concurrently in a performance. Musical voices such as the melody, however, are often played asynchronously to other voices, which can lead to significant local alignment errors. Therefore, this PhD develops novel methods that handle asynchronies between the melody and the accompaniment by treating the voices as separate timelines in a multi-dimensional variant of dynamic time warping (DTW). Constraining the alignment with information obtained via classical DTW, these methods measurably improve the alignment accuracy for pieces with asynchronous voices and preserves the accuracy otherwise.
Once an accurate alignment between a score and an audio recording is available, we can exploit the score information as prior knowledge in automatic music transcription (AMT), for scenarios such as music tutoring where score is available. We use score-informed dictionary learning technique to learn for each pitch spectral patterns describing the energy distribution of the associated notes in the recording. More precisely, we constrain the dictionary learning process in non-negative matrix factorization (NMF) using the aligned score. This way, by adapting the dictionary to a given recording, we achieve an improved accuracy compared to the state of the art. Hide full

If you are interested in doing a PhD, please contact me (informally).