Putting Linguistics into Speech Recognition

High-performance spoken dialogue interfaces typically use a spoken command grammar, which defines what the user can say when talking to the system. For complex applications, implementation and maintenance of this grammar is a major task requiring specialized expertise in computational linguistics and software engineering. The resulting grammars are difficult to maintain, and do not port easily to new domains.

Regulus is an Open Source toolkit for construction of spoken command grammars, which has been developed since 2001 by a consortium whose main partners have been NASA Ames, the University of Geneva, and Fluency Voice Technology. Grammar development with Regulus is carried out using example-based methods and reusable grammar resources, which reduces the level of expertise needed and makes the process more automated. The Regulus approach is effective for building command grammars even at initial stages of a project when there may be little or no domain data available. Regulus has been used to build command grammars for several major projects. Among these are NASA's Clarissa, which in 2005 became the first spoken dialogue system to be deployed in space, and MedSLT, an Open Source medical speech translator developed at Geneva University.

This book presents a complete description of both the practical and theoretical aspects of Regulus, including several example applications which can be downloaded from the companion website.

Manny Rayner is a research scientist at NASA Ames Research Center, California, and Geneva University, Switzerland. Beth Ann Hockey is a research scientist at Ames Research Center. Pierrette Bouillon is project lead on MedSLT medical spoken language translation project translation project at Geneva University.

Foreword xi
1 Introduction 1

1.1 What this book is about 1
1.2 Speech recognition and language models 5
1.3 What Regulus does 13
1.4 Clarissa and MedSLT 15
1.5 Related work 20
1.6 Plan of the book 20
1.7 Summary 21

I Using Regulus 23

2 Getting started 25

2.1 Getting set up 25
2.2 A toy grammar in GSL 28
2.3 Rewriting Toy0 in Regulus 32
2.4 Regulus configuration files 37
2.5 Using Regulus 39
2.6 Summary 40

3 Simple applications 43

3.1 Introduction 43
3.2 The Regulus Speech Server 44
3.3 A toy dialogue system in Prolog 46
3.4 A toy speech translation system in Prolog 50
3.5 A toy dialogue system in Java 53
3.6 Summary 62

4 Developing grammars 65

4.1 Introduction 65
4.2 Using the Regulus development environment 65
4.3 The Toy1 example grammar 67
4.4 Unification 77
4.5 Macros 81
4.6 Compiling the Toy1 recogniser 85
4.7 Systematic testing of recognisers 87
4.8 Summary 89

5 A spoken dialogue system 93

5.1 Introduction 93
5.2 The Toy1 spoken dialogue system 95
5.3 The input manager 102
5.4 The dialogue manager 104
5.5 The output manager 108
5.6 Integrating dialogue management with recognition 108
5.7 Dealing with ellipsis and corrections 112
5.8 Summary 117

6 A speech translation system 119

6.1 Introduction 119
6.2 Transfer-based systems 120
6.3 Developing translation applications 127
6.4 Translation through interlingua 132
6.5 Translation of ellipsis 134
6.6 Systematic development 138
6.7 Integrating translation with recognition 142
6.8 Summary 145

7 Using grammar specialisation 149

7.1 Overview 149
7.2 Using the general English grammar 150
7.3 The training corpus 154
7.4 Adding lexical entries 156
7.5 General grammar semantics 166
7.6 Multiple top-level specialised grammars 169
7.7 Including lexicon entries directly 169
7.8 Dealing with ambiguity 171
7.9 Making compilation more efficient 171
7.10 Using probabilistic tuning 172
7.11 Summary 173

II How Regulus Works 175

8 Compiling feature grammars into CFG 177

8.1 Introduction 177
8.2 Exhaustive expansion 178
8.3 Filtering 179
8.4 Efficient filtering of CFGs 182
8.5 Interleaving expansion and filtering 186
8.6 Pre-processing of feature grammars 195
8.7 Transforming the output CFG 199
8.8 Semantics 203
8.9 Summary 203

9 A general English feature grammar for speech 205

9.1 Introduction 205
9.2 What makes speech grammars special 206
9.3 English grammar: basic intuitions 206
9.4 Compositional semantics 209
9.5 Noun phrases 211
9.6 Verb phrases and basic clauses 214
9.7 Adjuncts 228
9.8 Coordination 229
9.9 Feature defaults 230
9.10 Summary 231

10 Grammar specialisation using Explanation Based Learning 233

10.1 Explanation Based Learning 233
10.2 Defining cutting-up criteria 244
10.3 Different kinds of cutting-up criteria 246
10.4 Summary 251

11 Performance of grammar-based recognisers 255

11.1 Introduction 255
11.2 Varying vocabulary size 256
11.3 Varying linguistic coverage
11.4 Varying the feature set 261
11.5 Varying the cutting-up criteria 263
11.6 Comparing CFG and PCFG language models 266
11.7 Deriving recognisers from general grammars 267
11.8 Summary 268

12 Comparison of rule-based and robust approaches 271

12.1 Introduction 271
12.2 Methodological issues 272
12.3 Experiments on MedSLT 279
12.4 Experiments on Clarissa 281
12.5 Discussion 282
12.6 Summary 286

13 Summary and future directions 289

13.1 Summary 289
13.2 Future directions 291

Appendix: Online Documentation 293
References 295
Index 301

4/1/2006

ISBN (Paperback): 1575865262 (9781575865263)
ISBN (Cloth): 1575865254 (9781575865256)
ISBN (Electronic): 1575868555 (9781575868554)

Putting Linguistics into Speech Recognition

Contents