Ling 203 Assignment 1

Lingusitics 203: Assignments
Week 1: September 25, 2002
Due: October 2, 2002

Readings:

Muehleisen, Victoria. 1998. Why Isn't Little the Opposite of Large? Antonymy and Semantic Range. LACUS Forum 24: 216-226.
Justeson, John S. and Slava M. Katz. 1992. Redefining Antonymy: The Textual Structure of a Semantic Relation. Literary and Linguistic Computing 7: 176-184.
Justeson, John S. and Slava M. Katz. 1995. Principled Disambiguation: Discriminating Adjective Senses with Modified Nouns. Computational Linguistics 21: 1-28.

Wet, Dry and Their Near-synonyms

Near the conclusion of her paper, Muehleisen writes "Shared semantic range is likely to prove useful in understanding other cases of antonymy, (e.g., the question of why arid and parched are not antonyms of wet ...)" (1998:225). Your first assignment is to explore antonymy as it relates to the adjective wet and its near-synonym moist and the adjective dry and its near-synonym arid, by replicating on a smaller scale Muehleisen's methodology. The adjectives wet and dry constitute a pair of "good" antonyms, yet wet and arid do not form a pair of "good" antonyms, nor do dry and moist. (Both wet and dry have other near-synonyms, but you can ignore them.)

Each of you will be assigned to a group of 2-3 students that will consider one of the sets of two antonym pairs below:

A. wet, arid; wet, dry
B. wet, dry; moist, dry
C. moist, arid; moist, dry

(a) Determine the "semantic range" of each of the adjectives you are exploring. That is, set out the classes of nouns that turn up in the syntactic environment "Adjective Noun" for each one. Give each class an appropriate descriptive label and be sure to pair it with a representative list of class members. You should use classes of approximately the level of generality found in Muehleisen's paper, unless there is a good reason to use a finer or coarser grained class. You should formulate these classes of nouns on the basis of ACTUAL data on the use of these adjectives; see below for instructions on how to use the Cobuild Corpus Sampler for this purpose. You should come up with classes that cover most of your data, but don't feel obliged to cover every last noun. (Ignore the uses of dry in dry wine/vermouth and dry humor and any other uses of these adjectives that strike you as idiomatic.)

(b) Determine the overlap or lack thereof in the semantic range of the adjectives in the two antonym pairs you are examining. Do set out the overlap clearly, but you do not need to set it out using the specific graphics that Muehleisen employed. Does the degree of overlap you found accord with the intuitions about which pairs of adjectives constitute "good" or "bad" antonym pairs?

You should be ready to discuss your results in class. Please bring hard copies or transparencies showing the semantic range overlaps of the two antonym pairs you were assigned.

How to Collect Relevant Data

As a source for data use the Cobuild Corpus web site:

http://titania.cobuild.collins.co.uk/form.html

This web site has material from a range of written and spoken sources, both English and American, that was used in the development of the Collins Cobuild dictionaries and ESL materials. It allows for some relatively sophisticated searching; click on "query syntax" for details. For the purposes of this question, you will want to use the query box in the Cobuild Corpus Sampler part of the form. If you type into the query box the sequence dry/JJ+NOUN, you will get instances of nouns that follow the adjective dry in the Cobuild corpus. (The "JJ" after dry limits the search to adjective uses of dry.) You can do comparable searches for the other adjectives.

You only get 40 examples at a time, but can partially get around this limitation by searching on only one of the subcorpora at once. If you also think there are more than 40 instances of a particular adjective in the subcorpus, you can repeat your search. Since the examples are chosen randomly, you are likely to get some, or even a lot, that you haven't seen before. You should only get enough data to get a sense of the distribution of the adjectives under study, probably on the order of 100 uses of each adjective. (You may also want to carry out some more specific adjective-noun combination searches to fill in gaps in the data about semantic classes relevant to each adjective.)

There is a second, supplementary source of data, which you can use, if you want. Through the Cobuild Collocation Sampler part of the form (it's below the Corpus Sampler part of the form on Cobuild web page), you can find words that occur near a given word more often than chance via the "mutual information" statistic or another statistic known as a "T-score". Muehleisen used the mutual information statistic in her study to help choose nouns frequently modified by the adjectives she was studying. The T-score statistic also finds frequently cooccurring words and has been increasingly adopted since it remedies certain shortcomings of the mutual information statistic. To use these statistics, scroll down to the Collocation Sampler and click on the appropriate one. Then type one of the adjectives in the query window; after a noticeable pause, you'll get a list of the 100 words most often occurring within four words to the left or the right of the word you asked about. These words are less carefully controlled than in Muehleisen's study: they don't necessarily occur right next to the target word, they occur on the left as well as the right of the target word, they are not limited to nouns, and if the target word belongs to more than one part of speech, they are not limited to those occurring with a particular part of speech. Nevertheless, you can use your own intuitions to choose those instances that are nouns which could occur to the right of the adjective, and this will give you a sense of some nouns that are very frequently modified by the adjective you are interested in. Due to the serious limitations inherent in this data, you should not rely solely on T-score and mutual information data in answering this question.

If you are interested in finding more data, you can also look at the British National Corpus web site, http://sara.natcorp.ox.ac.uk/lookup.html. Unfortunately, it is less easy to do the kind of targeted search for adjectives followed by nouns efficiently in this corpus. You can find information on doing more specific searches by looking in the SARA Manual ( http://thetis.bl.uk/CHAP4/); this manual is intended for the full version of the BNC, but section 3 contains information on pattern searching and some of this works on the web version.

Last modified: January 19, 2003

Lingusitics 203: Assignments Week 1: September 25, 2002 Due: October 2, 2002

Readings:

Wet, Dry and Their Near-synonyms

How to Collect Relevant Data

Lingusitics 203: Assignments
Week 1: September 25, 2002
Due: October 2, 2002