18 April 1997

A Probabilistic Approach to Lexical Functional Analysis

Ron Kaplan

Xerox PARC/Stanford

Rens Bod

University of Amsterdam

A linguistic theory is usually charged with assigning appropriate linguistic representations to each and every sentence of a language. It thus gives a formal definition of what constitutes a well-formed linguistic representation, and it also provides for rules, derivational mechanisms, and other specifications which determine how representations are assigned to particular sentences. The descriptive devices of the theory also carry a burden of scientific explanation, and they are thus evaluated according to how simple the individual rules are, how well they express independent linguistic generalizations, and how freely they interact to provide coverage for the whole language. They are typically the elements that probabilities or other scores are attached to in order to model notions of preference or graded acceptability.

The Data-Oriented approach of Bod, Scha, and Sima'an suggests a different view of linguistic analysis. On this view, a linguistic theory offers only a characterization of well-formed representations. It does not provide for any rules or other descriptive devices, and such formal mechanisms play no explanatory role. The assignment of appropriate representations to novel sentences is instead accomplished by probabilistic generalizations from a given corpus of correctly annotated sentences. Interestingly, the probabilistically significant units of analysis may be larger and more complex than conventional rules and lexical entries, reflecting for example the special properties of idioms and other collocations. In this talk we will outline this general approach and show how it can apply to the representations of Lexical-Functional Grammar. We will also discuss some of the conceptual issues that this data-oriented approach brings to light.