For lists of possible undergraduate, postgraduate and PhD research topics, see my teaching page.
Grants / Funding
- CroDeCo (Cross-Lingual Analysis for Detection of Cognitive Impairment in Less-Resourced Languages) - ARIS J6-60109 2025-27, Principal Investigator
- AdSoLve (Addressing Socio-Technical Limitations of LLMs for Medical and Social Computing) - Responsible AI UK (EPSRC EP/Y009800/1) KP0016 2024-28, Co-Investigator
- ARCIDUCA (Annotating Reference and Coreference In Dialogue Using Conversational Agents in games) - EPSRC EP/W001632/1 2022-25, Co-Investigator
- Sodestream (Streamlining Social Decision-Making for Enhanced Internet Standards) - EPSRC EP/S033564/1 2020-24, Principal Investigator
- RobaCOFI (Robust and Adaptable Comment Filtering) - AI4Media 2022-23, Co-Investigator
- EMBEDDIA (Cross-Lingual Embeddings for Less-Represented Languages in European News Media) - EU H2020 825153 2019-22, Local Principal Investigator
- SEVERE (SEVErity Detection in REal Care Comments) - Care Quality Commission 2018, Principal Investigator
- Institute of Coding - HEFCE 2018-20, Co-Investigator
- TEMPO (Twitter Exploration for Mining Patient Opinions) - InnovateUK 2016, Principal Investigator
- SLaDe (Sensing Language for Dementia Diagnosis) - QM Intelligent Sensing Fund 2015, Principal Investigator
- ConCreTe (Concept Creation Technology) - EU FP7 611733 2013-16, Co-Investigator
- CMSI (Cultural Mobility through Social Intelligence) - CreativeWorksLondon/AHRC 2013, Academic Partner
- AOTD (Analysing Online Therapy Dialogue) - QM Innovation Fund 2013, Principal Investigator
- Reel Reviews: A Social Movie Recommendation App - QM Innovation Fund 2013-14, Principal Investigator
- Centre for Digital Music - EPSRC Platform Grant EP/K009559/1 2013-18, Co-Investigator
- RISER (Robust Incremental Semantic Resources for Dialogue) - EPSRC EP/J010383/1 2012-13, Principal Investigator
- Chatterbox Development of Prototype - TSB SMART 720149 2012-13
- PPAT (Predicting Patient Adherence to Treatment from Dialogue Transcripts) - EPSRC EP/J501360/1 2012, Principal Investigator
- Prototyping Social Clothing - ImpactQM 2012, Principal Investigator
- Chatterbox - QM Proof of Concept Fund 2012
- Chatterbox Proof of Market - TSB SMART 700081 2011-12
- DynDial (The Dynamics of Conversational Dialogue) - ESRC ES/F0271171 2008-11, Named Researcher
Research Students & Post-docs
Current
- Pakawat Nakwijit - PhD, social use of misspelling in Thai
- Iacopo Ghinassi - PhD, multimodal broadcast segmentation and understanding
- Peyman Hosseini - PhD, sentiment analysis for longer texts
- Zahraa Al Sahili - PhD, bias in multimodal machine learning
- Nikola Ivačič - PhD, text clustering and tracking over time
- Xiangyan Chen - PhD, improving factuality in dialogue systems
- Jaya Caporusso - PhD, language and expressions of self
- Zicen Liao - PhD, clarification and adaptation in dialogue systems
Previous
- Vanja M Karan (→ Vienna) - post-doc RA, SoDeStream project
- Yujian Gan (→ UCL) - post-doc RA, ARCIDUCA project (previously PhD, text-to-SQL parsing)
- George Wright (→ Birkbeck) - PhD, creative narrative generation
- Shamila Nasreen (→ MUST) - PhD, NLP for dementia diagnosis
- Ravi Shekhar (→ Essex) - post-doc RA, EMBEDDIA and Sodestream projects
- Prashant Khare (→ UK8) - post-doc RA, SoDeStream project
- Morteza Rohanian (→ Zurich) - PhD, multimodal NLP and mental health
- Jorge del Bosque Trevino (→ Nanu) - PhD, explanations in dialogue systems
- Carlos Armendariz - RA, EMBEDDIA project & MPhil, meaning fluidity in dialogue
- Tanmoy Mukherjee (→ VUB) - PhD, cross-modal representation learning
- Mariano Mora McGinity - PhD, language learning and cooperativity
- Max Droog-Hayes (→ action.ai, ieso) - PhD, semantic summarisation
- Nanda Khaorapapong (→ qConsult) - PhD, wearable computing for social interaction
- Christine Farion (→ York) - PhD, ubiquitous computing to combat forgetfulness
- Jeni Maleshkova - PhD, virtual interactive environments
- Stephen McGregor (→ Goldsmiths, action.ai) - PhD, context-dependent distributional semantics and non-literal language
- Sasha Scott (→ European Broadcasting Union) - PhD, public mourning behaviour on social media
- Dmitrijs Milajevs (→ NIST) - PhD, distributional semantics for words and sentences
- Shauna Concannon (→ Newcastle) - RA, CMSI/TEMPO projects & PhD, argumentation in online discussion
- Sascha Griffiths (→ Hamburg) - post-doc RA, creativity and conceptual semantics (ConCreTe project)
- Niall Gunter (→ OpenBet) - RA, SLaDe project
- Julian Hough (→ Bielefeld) - RA, RISER project & PhD, incremental self-repair processing in dialogue
- Christine Howes (→ Gothenburg) - post-doc RA, dialogue mining for clinical consultations (PPAT and AOTD projects)
- Henrietta Eyre (→ Black Swan Data) - post-doc RA, creativity and conceptual semantics (ConCreTe project)
- Arash Eshghi (→ Heriot-Watt) - post-doc RA, induction for incremental semantic grammars (RISER project)
Research Topics
Incrementality in Dialogue
Dialogue is incremental -- people don't speak (or listen) in complete, stand-alone sentences, but build up meaning bit-by-bit in an interactive process. We can interrupt each other, continue each other's utterances, and engage in an incremental process of feedback and repair. I'm involved in the DynDial project investigating how this happens (through corpus and experimental work) and how we can model it (in grammatical frameworks and dialogue systems). As part of this we've built various parsers and dialogue systems using Dynamic Syntax; see here.
Open-Domain Dialogue Understanding
Understanding dialogue is a genuinely hard problem; we humans are good at it, partly because we have a pretty good idea of what might make sense in a given context. Most computer dialogue systems make use of the same insight: as they work in a restricted domain, they can map from words to sensible meaning representations based on what's sensible or possible in that domain; and as they're involved in the dialogue themselves, they can always ask for clarification if they need it. When the problem is to understand what people are saying as an overhearer, without knowing much of the domain, it's harder; but this is exactly the problem we encounter if we want to build something like an automatic meeting assistant (like the system we developed at Stanford on the CALO project for detecting decisions and action items -- see here). I'm interested in developing robust techniques for detecting high-level topic structure, low-level aspects like addressing, and important conversational structures like decision-making and action item assignment.
Multi-Device Dialogue Systems
I am also working on an in-car spoken dialogue system project, to help people interact with the increasingly complex multiple devices in their car (stereo, phone, navigation & information systems) without having to divert their eyes or hands from the more critical job of driving. This brings a couple of interesting questions into the equation: firstly, how do we know which device is being addressed at any time (especially given the perennial problem of noisy speech recognition); and secondly, how do we even know if the system is being addressed at all, rather than a passenger? Amongst other things, we're approaching this by combining deep & shallow information (e.g. parse structures with topic classifiers) for increased robustness, while working on intelligent clarification and confirmation strategies.
Clarification Requests
While at King's College London I worked on the ROSSINI project, and my PhD thesis investigated clarification questions: what types people use when, how they should be interpreted, how they can be treated or used by a dialogue system, and what they tell us about semantics in general. I'm still working on this area, particularly on suitable semantic representations, in collaboration with Jonathan Ginzburg. As part of my thesis I built a prototype dialogue system, CLARIE, designed to be able to (a) interpret users clarification questions and respond suitably, and (b) ask clarification questions in order to learn new words and phrases. One of the things I'm currently working on (with Raquel Fernández) is extending it to incorporate an element of machine learning: using classifiers to determine optimum methods of fragment resolution. In the mean time, you can try the basic (rule-based) thesis version here, but be warned that the grammar is very limited - it might be worth getting in touch with me first.