Verbmobil: A Translation System for Face-to-Face Dialog

Verbmobil is a portable simultaneous interpreter. Carry it to a meeting with speakers of other languages and it will translate your spoken words for them. Their Verbmobils, if they have them, will allow you to understand what they are saying.

So far, Verbmobil exists only as a research program of the Bundesministerium für Forschung und Technologie, Germany's Federal Ministry of Research and Technology. If the program's goals are met, the first experimental prototypes, with restricted capabilities, will exist at CSLI to assess the realistic chances of success for the Verbmobil program.

The authors give an overview of the new discipline of speech-based machine translation. They survey the state of the art in the separate fields of machine translation and speech recognition and evaluate the major obstacles to further progress in both fields. A chapter is devoted to the special problems of integrating speech recognition and natural language systems within the context of machine translation. Their appraisals and recommendations of the Verbmobil project are required reading for computer scientists and linguists.

Martin Kay is a professor of linguistics at Stanford University, a research fellow at Xerox Palo Alto Research Center, and the permanent chair of the International Committee on Computational Linguistics. Jean Mark Gawron is a research linguist at SRI International. Peter Norvig is a senior computer scientist for Sun Microsystems Labs.

Preface

1 Introduction

2 Machine Translation

2.1 Why is Machine Translation Hard?

2.1.1 Situated Language

2.1.1.1 Feynman's Safe Look
2.1.1.2 “Open”
2.1.1.3 Bus Tickets
2.1.1.4 An Example from Computer Programming
2.1.1.5 Discussion

2.1.2 Translation Mismatches

2.1.2.1 The Semantic Grid
2.1.2.2 Function Words and Affixes

2.1.3 What Counts as a Translation
2.1.4 Ambiguity

2.1.4.1 The Lexicon
2.1.4.2 Lexical Fields
2.1.4.3 Selectional Restrictions
2.1.4.4 Collocations
2.1.4.5 Reference
2.1.4.6 Syntax

2.2 Machine Translation Systems

2.2.1 Translation Quality
2.2.2 Quality and the Users of Translation
2.2.3 Human Intervention

2.2.3.1 Monolingual Human

2.2.4 Translator's Assistant: Machine-Assisted Human Translation

2.3 The Historical Perspective
2.4 Linguistic Issues

2.4.1 Linguistic Levels of Analysis
2.4.2 Morphology and Phonology
2.4.3 Syntax

2.4.3.1 Phrase Structure and Dependency
2.4.3.2 Procedural and Declarative Grammars

2.5 Translation Strategy

2.5.1 Nonlinguistic Information

2.5.1.1 Analogical Approaches
2.5.1.2 Knowledge-Based and Inference-Based Approaches
2.5.1.3 Connectionism

2.5.2 Direct, Interlingual and Transfer Methods

2.5.2.1 The Module-Counting Argument
2.5.2.2 Language-Pair Independence
2.5.2.3 The Naive Interlingual Scheme
2.5.2.4 Translation by Negotiation
2.5.2.5 Semantic Representations: What's in the Interlingua

2.6 Current Machine Translation Systems

2.6.1 Japan

2.6.1.1 Commercial Sytems
2.6.1.2 Government Funded
2.6.1.3 Research and Academic

2.6.2 North America

2.6.2.1 Commercial
2.6.2.2 Non-Profit
2.6.2.3 Academic and Research

2.6.3 Europe

3 Speech Recognition

3.1 Why Use Speech?

3.1.1 Why Speech is Attractive
3.1.2 Practical Use of Speech

3.2 The Difficulties of Speech Recognition

3.2.1 Constraining the Task to Make Recognition Easier

3.2.1.1 Kind of Speech
3.2.1.2 Speaker-Dependence
3.2.1.3 Signal Quality and Noise
3.2.1.4 Vocabulary Size
3.2.1.5 Task and Language Constraints

3.3 The Technology of Speech Recognition

3.3.1 Signal Processing
3.3.2 Properties of Speech
3.3.3 The Acoustic Model
3.3.4 The Language Model

3.4 History and Taxonomy of Approaches

3.4.1 Template Based Approaches

3.4.1.1 Feature Extraction
3.4.1.2 Template Similarity Measurement
3.4.1.3 Decision Making

3.4.2 Knowledge-Based Approaches
3.4.3 Stochastic-Based Approaches
3.4.4 Connectist Approaches

3.5 Outstanding Problems
3.6 Speech Synthesis
3.7 Prosody

3.7.1 Interface
3.7.2 Suprasentential Prosody
3.7.3 Evaluation of Synthesis
3.7.4 Recommendation

3.8 Voice Conversations

3.8.1 Recommendations Regarding Voice Conversion

3.9 Current Speech Recognition Systems

3.9.1 Japan
3.9.2 North America
3.9.3 Europe
3.9.4 Comparison of Systems

3.10 Conclusions and Recommendations

4 Recommendations

4.1 Introduction

4.1.1 Verbmobil for Face-to-Face Conversations
4.1.2 Secondary Communication Channels
4.1.3 Overlapping
4.1.4 Repairs
4.1.5 Different Speakes
4.1.6 Background Noise
4.1.7 Monitoring
4.1.8 Psychological Factors

4.2 Overall Recommendations
4.3 Product One

4.3.1 The Domain Restriction

4.3.1.1 Design Motivation
4.3.1.2 User Motivation

4.3.2 Purposefulness and Evaluability

4.3.2.1 Cooperation
4.3.2.2 Manipulability

4.3.3 Possible Domains

4.3.3.1 The Map Task
4.3.3.2 Trucking
4.3.3.3 Journeyman-Apprentice Tasks
4.3.3.4 Conference Registration
4.3.3.5 Contract Negotiations
4.3.3.6 Design Negotiations

4.4 Product Two
4.5 The Experimental Paradigm

4.5.1 Data Collection
4.5.2 Processing Data
4.5.3 The Breadboard

4.6 Variants of Vermobil

4.6.1 The Electronic Blackboard
4.6.2 Assistants

4.7 System Design

4.7.1 Programming

4.7.1.1 Nondeterminism

4.7.2 Modularity
4.7.3 Formalism

4.8 Translation

4.8.1 Translation and Interpreting
4.8.2 Translation as Compromise

4.9 Analysis and Generation

4.9.1 Learning from Experience

4.10 Speech
4.11 Dialog

Bibliography

1/1/94

ISBN (Paperback): 0937073954 (9780937073957)
ISBN (Cloth): 0937073962 (9780937073964)

Verbmobil

Contents