Multi-Party Discourse Understanding

Most of my time at the moment is spent working on the CALO project, in particular working on automatic understanding of human-human meetings - a hard problem, as this involves dealing with multiple people interacting with each other, communicating in multiple modalities simultaneously (e.g. physical/facial gestures and whiteboard writing as well as speech), and using uncontrolled speech about pretty much any topic they like. We're approaching this in several ways, both top-down (e.g. trying to spot general topics and switches between them) and bottom-up (building broad-coverage open-domain parsing tools and using them to get at individual utterances where possible). One of our most active interests at the moment is trying to spot and understand action items: not only are these useful things to understand and report, it's somewhere we can combine lexical, semantic and basic discourse structure information to get some reasonably robust results.

Multi-Device Dialogue Systems

I am also working on an in-car spoken dialogue system project, to help people interact with the increasingly complex multiple devices in their car (stereo, phone, navigation & information systems) without having to divert their eyes or hands from the more critical job of driving. This brings a couple of interesting questions into the equation: firstly, how do we know which device is being addressed at any time (especially given the perennial problem of noisy speech recognition); and secondly, how do we even know if the system is being addressed at all, rather than a passenger? Amongst other things, we're approaching this by combining deep & shallow information (e.g. parse structures with topic classifiers) for increased robustness, while working on intelligent clarification and confirmation strategies.

Clarification Requests

While at King's College London I worked on the ROSSINI project, and my PhD thesis investigated clarification questions: what types people use when, how they should be interpreted, how they can be treated or used by a dialogue system, and what they tell us about semantics in general. I'm still working on this area, particularly on suitable semantic representations, in collaboration with Jonathan Ginzburg. As part of my thesis I built a prototype dialogue system, CLARIE, designed to be able to (a) interpret users clarification questions and respond suitably, and (b) ask clarification questions in order to learn new words and phrases. One of the things I'm currently working on (with Raquel Fernández) is extending it to incorporate an element of machine learning: using classifiers to determine optimum methods of fragment resolution. In the mean time, you can try the basic (rule-based) thesis version here, but be warned that the grammar is very limited - it might be worth getting in touch with me first.

Dynamic Syntax

I am also interested in Dynamic Syntax, a word-by-word incremental parsing-based grammar formalism, and particularly its potential application to dialogue. Some time ago I built a prototype Prolog implementation which you can try here: it includes a parser and generator. I'm currently working in collaboration with Ruth Kempson and Ronnie Cann on developing a context-dependent version, which seems to give neat accounts of lots of dialogue phenomena (various kinds of anaphora and ellipsis, people's ability to continue other people's utterances in mid-sentence, and people's tendency to mirror each other's structures).